SPIKE-03 Native Subagent Smoke Test¶
SPIKE-03 records the retained evidence for native subagent behavior in headless Codex runs.
Experiment ID¶
SPIKE-03
Question¶
Can native subagent behavior be invoked, observed, and captured clearly enough in headless mode to support the spike?
Setup¶
Primary artifacts:
tmp/codex-subagent-smoke-stdout.jsonltmp/codex-wrapper-subagent-best-stdout.jsonltmp/codex-subagent-smoke-stderr.logtmp/codex-wrapper-subagent-best-stderr.logtmp/supervisory-shell-trial-stdout.jsonltmp/supervisory-shell-trial-stderr.log
The retained evidence spans:
- one minimal subagent smoke test
- one wrapper-based low-noise subagent run
- one higher-level supervisory shell trial
Result¶
Native subagent behavior is real and observable in headless mode.
Strongest retained evidence:
spawn_agentappears as acollab_tool_callwaitappears as acollab_tool_call- the waited-on agent returns a concrete completion message
- the parent agent emits a final top-level message after the wait completes
Minimal proof path:
tmp/codex-wrapper-subagent-best-stdout.jsonl
That artifact shows:
- one
spawn_agent - one
wait - one completed subordinate result with message
AVAILABLE - one final parent response
SUBAGENT_OK
Important limitation:
- the minimal smoke test proves availability, not usefulness
- the supervisory-shell trial proves some usefulness, but also exposed a parent-rollout fork-context failure on the first spawn strategy
Useful Artifacts¶
tmp/codex-wrapper-subagent-best-stdout.jsonlis the clearest compact proof artifacttmp/codex-subagent-smoke-stdout.jsonlis the smallest proof thatspawn_agentappears in headless JSONL outputtmp/supervisory-shell-trial-stdout.jsonlis the best evidence that native subagent use can contribute to a broader supervisory task, even though the first inherited-context strategy failed
Next Action¶
Treat native subagent behavior as observed capability, not just documented capability.
Do not assume inherited forked context is dependable.
The next useful experiment is the narrow supervisory comparison:
- same task
- same files
- same time budget
- direct single-agent pass versus tightly bounded supervisor with at most two subagent calls