SPIKE-03 Native Subagent Smoke Test¶

SPIKE-03 records the retained evidence for native subagent behavior in headless Codex runs.

Experiment ID¶

SPIKE-03

Can native subagent behavior be invoked, observed, and captured clearly enough in headless mode to support the spike?

Primary artifacts:

The retained evidence spans:

Native subagent behavior is real and observable in headless mode.

Strongest retained evidence:

Minimal proof path:

That artifact shows:

Important limitation:

the minimal smoke test proves availability, not usefulness
the supervisory-shell trial proves some usefulness, but also exposed a parent-rollout fork-context failure on the first spawn strategy

tmp/codex-wrapper-subagent-best-stdout.jsonl is the clearest compact proof artifact
tmp/codex-subagent-smoke-stdout.jsonl is the smallest proof that spawn_agent appears in headless JSONL output
tmp/supervisory-shell-trial-stdout.jsonl is the best evidence that native subagent use can contribute to a broader supervisory task, even though the first inherited-context strategy failed

Treat native subagent behavior as observed capability, not just documented capability.

Do not assume inherited forked context is dependable.

The next useful experiment is the narrow supervisory comparison:

same task
same files
same time budget
direct single-agent pass versus tightly bounded supervisor with at most two subagent calls