Codex Headless Communication Report¶
This report consolidates the observed behavior, errors, and viable invocation path from the first direct headless Codex communication experiments.
Summary¶
Headless Codex communication is viable, but the surface is still only partially understood.
What is now clear:
codex execis usable for machine-readable collaboration,- user-shell context matters materially,
- repo-local config is active and useful,
- native subagent behavior is available in headless mode,
- and
pluginsplusshell_snapshotare major noise contributors in this environment.
What is not yet clear:
- the functional cost of disabling noisy surfaces,
- the value of persistence/state features that currently look like noise,
- whether a user-launched supervisory agent is materially better than wrapper-mediated collaboration,
- and how much of Codex's native collaboration/product surface should be reused rather than rebuilt around.
Experiment Sequence¶
E1. Baseline Headless Reachability¶
- default authenticated invocation with a trivial
ACKprompt - result: success
- learning: Codex is reachable headlessly with the default authenticated home
E2. Isolated Home Trial¶
- repo-local
CODEX_HOMEwith a trivialACKprompt - result: failure
- learning: isolated state without auth is not viable; noise dropped but auth failed
E3. Stream Separation¶
- authenticated invocation with explicit
stdoutandstderrseparation - result: success
- learning: this is the key cleanup move; collaboration output is clean on
stdout
E4. Real Review Prompt¶
- authenticated invocation with a real ADR review prompt and explicit
stdout/stderrseparation - result: success
- learning: the channel is not only alive; it can produce useful review findings
E5. User-Shell Comparison¶
- the same
ACKcommand run from the user's shell - result: success with clean
stdoutand emptystderr - learning: Codex behaves better in a real user-shell context than in the live agent execution context
E6. Repo-Local Wrapper¶
- wrapper-script invocation of both
ACKand a real review task - result: success
- learning: a tiny wrapper is useful for normalization and capture, but does not itself reproduce user-shell cleanliness
E7. Repo-Local Config¶
- repo-local
.codex/config.tomlwithcollabandcollab_execprofiles - result: success
- learning: repo-local Codex config is active and can shape local CLI behavior, but did not materially reduce the core noise by itself
E8. Native Subagent Smoke Test¶
- headless prompt explicitly requesting one subagent
- result: success
- learning: native subagent behavior is available in headless
exec;spawn_agentandwaitevents appear in JSONL output
E9. Feature-Noise Reduction¶
- wrapper runs with
--disable plugins - wrapper runs with
--disable plugins --disable shell_snapshot - result: success
- learning:
pluginsandshell_snapshotaccount for most of the noise in this environment
E10. Low-Noise Native Subagent Path¶
- wrapper + repo-local
collabprofile +--disable plugins --disable shell_snapshot+ explicit subagent prompt - result: success
- learning: the lower-noise path still preserves native subagent collaboration events and returns clean machine-readable output
E11. User-Shell Supervisory Trial¶
- shell-launched Codex supervisor using the supervisory-shell-trial assets
- result: useful but not clean
- learning:
- the supervisor did perform distinct delegated workstreams,
- native subagent collaboration produced useful convergent findings,
- but the first three spawns failed because the session could not fork parent rollout context cleanly,
- so the supervisor had to recover by embedding explicit repo/task context in each subagent prompt
Results¶
1. Default Authenticated Home Is Usable¶
Using the installed Codex CLI against the default authenticated home succeeded for both a trivial handshake and a real review task.
Evidence:
- handshake output:
tmp/codex-stdout.jsonl - review output:
tmp/codex-review-stdout.jsonl
The review task produced concrete findings about OA01.4, so this was not only a connectivity test.
2. Isolated CODEX_HOME Reduced Some Noise but Broke Authentication¶
Running with an isolated repo-local CODEX_HOME removed the broken local state-db warnings, but the invocation then failed with authentication errors.
Observed sequence:
- initial websocket retries with temporary server-side
500errors, - fallback to HTTP,
- then repeated
401 Unauthorizedresponses because the isolated state did not carry usable auth.
This means isolated state is not currently a drop-in cleanup solution.
3. stdout and stderr Separation Is the Key Cleanup Move¶
Separating output streams was the most useful cleanup improvement.
When run as:
the result is:
- clean JSON event stream on
stdout, - noisy but ignorable startup/runtime diagnostics on
stderr.
This is the strongest practical outcome from the experiments.
4. User-Shell Context Is Cleaner Than Tool-Launched Context¶
The same ACK command run from the user's own shell produced a cleaner result than my tool-launched runs:
- clean JSON on
stdout, - empty
stderr.
Evidence:
- user-shell output:
tmp/codex-user-stdout.jsonl - user-shell stderr:
tmp/codex-user-stderr.log
This strongly suggests Codex expects a more normal user-shell environment than the live coding-agent tool environment naturally provides.
5. A Tiny Wrapper Script Works, But It Does Not Eliminate Environment Effects¶
A repo-local wrapper script was added at scripts/codex_ephemeral_exec.py.
It successfully:
- ran
ACK, - ran a real review task,
- captured
stdoutandstderr, - and extracted a final message summary.
Evidence:
- wrapper
ACKsummary/output:tmp/codex-script-stdout.jsonl - wrapper review output:
tmp/codex-script-review-stdout.jsonl
But the wrapper did not remove the noisy warnings when launched from this live agent environment. That confirms the wrapper is useful for normalization, not for magically reproducing a true user shell.
6. Repo-Local Config Is Real but Not Sufficient¶
A repo-local .codex/config.toml was added and used successfully with named profiles.
This proved two things:
- repo-scoped Codex config is a real working surface,
- but simply adding a repo-local profile does not by itself clean up the dominant warnings in this environment.
The current repo-local profiles are useful as a place to hold experiment defaults, not yet as a solution to the noise problem.
7. Native Subagent Support Is Confirmed in Headless Mode¶
The strongest new finding from this round is that built-in Codex collaboration is not merely documented; it is visibly active in headless JSONL output.
Observed in stdout:
collab_tool_callwithtool: "spawn_agent"collab_tool_callwithtool: "wait"- subordinate agent completion message
- final top-level response after the wait completes
This means future work should be careful not to rebuild a subagent system that Codex already exposes natively.
8. Disabling plugins and shell_snapshot Materially Reduces Noise¶
The wrapper-based comparison showed:
- baseline wrapper path: heavy plugin-manifest and shell-snapshot warnings
--disable plugins: plugin noise removed, state-db and shell-snapshot warnings remain--disable plugins --disable shell_snapshot: only state-db and ephemeral-thread warnings remain
This is the cleanest machine-oriented path found so far from the live agent environment.
9. Lower Noise Is Not Yet the Same Thing as Better Operation¶
The experiments show how to quiet the channel, but they do not yet show the cost of doing so.
Open question:
- are
plugins,shell_snapshot, or state-backed behavior functionally useful for real supervisory collaboration, even if they are noisy in simple tests?
That question remains unanswered and should be treated as a real design concern rather than an implementation detail.
10. The First Supervisory Trial Was Productive but Exposed a Launch-Mode Constraint¶
The shell-launched supervisory run was the first meaningful test of Codex as a supervisor rather than a single headless worker.
What happened:
- the supervisor read the workflow contract and task brief,
- attempted three distinct subagent workstreams,
- failed on the first three spawn attempts with a parent-rollout fork error,
- retried with explicit repo and task context embedded in each subagent prompt,
- and then successfully completed three distinct review workstreams plus a final synthesis.
The subagent workstreams were genuinely distinct:
- strategic coherence of
OA01.2toOA01.4 - practical feasibility and experiment shaping
- clarity, consistency, and pruning
This was not the same task repeated three times. The overlap happened later in the findings, where the three workstreams converged on similar structural issues.
The run also showed some limited creativity:
- the supervisor recovered from the failed spawn mode without user intervention,
- reframed the subagent prompts to compensate for missing inherited context,
- and produced a stronger synthesis than a straight-line single-pass critique would likely have produced.
At the same time, this was not a clean proof of the intended workflow contract.
Why:
- the supervisor had to retry the spawn strategy,
- it did a bit more framing and synthesis work itself than a strict reading of the contract would prefer,
- and the run surfaced a real product/runtime constraint rather than a purely prompt-level behavior question.
Practical Interpretation¶
These experiments separated three questions clearly:
- can headless collaboration happen,
- can native Codex collaboration happen,
- and can either happen through a clean enough channel to support more meaningful experiments?
Current answers:
- headless collaboration: yes
- native subagent collaboration: yes
- clean enough operator channel: yes, provisionally
But the path is still exploratory. We have found a viable low-noise route, not yet an endorsed operating mode.
The supervisory-shell trial strengthens that conclusion:
- native supervisory use is plausible,
- but the collaboration surface still has operational constraints that need to be understood before treating it as a clean substrate.
Matrix¶
| Surface | Expected / question | Observed here | Confidence | Current implication |
|---|---|---|---|---|
codex exec |
Official headless surface | Works for ACK, review prompts, and subagent prompts |
High | Main baseline path |
| Authenticated default home | Should carry usable auth/state | Required for successful runs | High | Keep using ~/.codex for now |
| Isolated repo-local home | Might give cleaner local isolation | Failed with 401 Unauthorized |
High | Not viable without explicit auth handling |
--ephemeral |
Should keep runs lightweight | Works; leaves a harmless backfill warning | High | Keep in baseline |
stdout / stderr split |
Should isolate usable output | Clean collaboration output on stdout |
High | Essential baseline practice |
Repo-local .codex/config.toml |
Should be a real config layer | Profiles are active and usable | High | Good place for repo-scoped defaults |
collab profile |
Should reduce approval friction | Works, but does not materially change noise | Medium | Useful as a baseline profile |
collab_exec profile |
inherit = "all" might help |
No meaningful cleanliness gain seen | Medium | Not currently special |
| Repo root context | Should pick up repo guidance | Returned AGENTS.md in prior tests |
Medium | Repo discovery likely works |
| Outside-repo context | Should not see repo guidance | Returned NONE in prior tests |
Medium | Boundary behaves plausibly |
| Wrapper script | Should normalize invocation | Works and returns a final summary JSON | High | Good thin launcher surface |
Wrapper stdin=DEVNULL |
Might remove stdin-related chatter | Reading additional input from stdin... still appears |
Medium | Not fixed yet |
plugins enabled |
Might add useful capability, may add noise | Heavy plugin-manifest noise | High | Need cost/benefit testing |
plugins disabled |
Might reduce noise | Removes most stderr spam | High | Strong cleanup lever |
shell_snapshot enabled |
Might aid environment capture, may add noise | Snapshot cleanup warning appears | High | Need cost/benefit testing |
shell_snapshot disabled |
Might reduce noise | Removes snapshot warning | High | Strong cleanup lever |
multi_agent feature |
Documented and feature-listed | Behaviorally confirmed via JSONL spawn_agent and wait |
High | Native collaboration surface is real |
| Native subagent events | Might not appear in headless mode | Visible in stdout JSONL |
High | Reuse before rebuilding |
| Low-noise subagent path | Might break collaboration behavior | Still supports spawn_agent + wait |
High | Best current machine-oriented path |
| Shell-launched supervisory run | Might show whether native supervision is meaningfully useful | Produced useful delegated work and synthesis, but required retry after fork-context failure | Medium | Promising enough to continue, but not a clean success |
| Initial fork-based subagent spawn | Might inherit parent context cleanly | Failed with parent thread rollout unavailable for fork |
Medium | Launch mode matters; explicit context fallback may be necessary |
| State DB persistence | Might be useful despite warnings | Repeated migration discrepancy warning | Low | Understand before disabling or discarding |
Error Classes Observed¶
A. Local State Database Migration Warnings¶
Observed on the authenticated default home:
failed to open state db at /Users/phapman/.codex/state_5.sqlite: migration 24 was previously applied but is missing in the resolved migrationsstate db discrepancy during find_thread_path_by_id_str_in_subdir: falling_back
Interpretation:
- the local Codex state database appears to be from an older or different internal migration set
- Codex continues operating by falling back rather than failing hard
- this is noisy but not currently fatal for headless use
Implication:
- this likely indicates a local Codex installation or state mismatch that could be cleaned up later
- it is not a blocker for immediate experimentation
B. Plugin Manifest Warnings¶
Observed repeatedly:
ignoring interface.defaultPrompt: prompt must be at most 128 characters
Interpretation:
- curated or installed plugin metadata under
~/.codex/.tmp/plugins/contains manifests with overlong default prompts - Codex logs the issue repeatedly during startup or plugin scanning
Implication:
- this appears to be upstream or CLI-environment noise, not project-specific failure
- it is currently harmless but very noisy
C. Shell Snapshot Cleanup Warnings¶
Observed:
- failed deletion of shell snapshot temp files with
No such file or directory
Interpretation:
- cleanup logic is attempting to remove temp snapshot files that are already gone
Implication:
- harmless noise
D. Ephemeral Thread Backfill Warning¶
Observed after successful runs:
ephemeral threads do not support includeTurns
Interpretation:
- Codex tries to backfill richer turn history even though ephemeral mode does not persist the session in the same way
Implication:
- harmless for current purposes
- another reason to treat
stdoutas the primary usable channel
E. Isolated-State Authentication Failure¶
Observed under repo-local CODEX_HOME:
- websocket
500 Internal Server Errorduring retries - fallback to HTTP
- repeated
401 Unauthorized: Missing bearer or basic authentication in header
Interpretation:
- repo-local isolated state lacks the authentication material used by the default home
- server-side transient websocket errors may also be present, but the decisive blocker was missing auth
Implication:
- a truly isolated state path would require explicit auth setup
- this is possible future work, but not required for current spike progress
F. Personality / Model Warning Under Some Paths¶
Observed in lower-noise subagent runs:
Model personality requested but model_messages is missing, falling back to base instructions
Interpretation:
- some model/profile combinations do not fully support the configured personality surface
- Codex falls back rather than failing hard
Implication:
- this is not currently a blocker
- but it means model/profile interactions are another surface worth understanding before turning defaults into stable tooling
G. Parent Thread Rollout Unavailable for Fork¶
Observed in the first supervisory-shell run:
collab spawn failed: Fatal error: parent thread rollout unavailable for fork
Interpretation:
- the spawned subagent path appears to expect parent rollout context that was not available in this launch mode
- the exact internal cause is still inferred rather than fully known, but the behavior is clear from the error and retry pattern
Implication:
- shell-launched native supervision works, but not all subagent spawn modes are equally dependable
- explicit task context in subagent prompts may currently be the safer path than assuming inherited context
Current Recommendation¶
For machine-oriented experimentation from this live environment, the best current path is:
- keep the authenticated default
~/.codexhome, - use repo-local profile
collab, - use
--ephemeral, - separate
stdoutandstderr, - disable
pluginsandshell_snapshotwhen the goal is a cleaner event channel, - treat
stdoutas the collaboration channel, - treat
stderras diagnostics only.
For higher-value next experiments, prefer a user-shell-launched supervisory run before doing much more wrapper or noise optimization.
For supervisory trials specifically:
- assume inherited fork-context may fail,
- and be prepared to pass explicit repo and task context to subagents.
Open Questions¶
- What practical capability is lost when
pluginsare disabled? - What practical capability is lost when
shell_snapshotis disabled? - Is the state DB warning merely noisy, or does it indicate degraded useful behavior?
- Can a user-launched supervisory Codex session coordinate real work more effectively than a thin wrapper-led collaboration pattern?
- How much of Codex's native subagent/planning surface should be treated as product capability to reuse rather than substrate to rebuild?
- Is inherited subagent fork-context dependable enough for future supervisory experiments, or should explicit-context delegation be the baseline?
- Does human framing remain necessary for genuine novelty and exploration, even if native supervision is operationally viable?
Conclusion¶
The experiments now support a slightly stronger but still cautious conclusion:
- headless Codex communication is viable,
- native headless subagent collaboration is also viable,
- a materially cleaner invocation path exists,
- a shell-launched supervisory run can generate useful delegated analysis,
- but the costs of suppressing noisy surfaces are still unknown,
- and inherited subagent context currently looks less dependable than explicit-context delegation.
So the work remains exploratory. The next important step is not more local noise chasing by default. It is a more meaningful supervisory experiment, ideally launched from the user's shell, so the project can judge whether the native collaboration surface is genuinely useful for the direction now emerging in OA01.2 to OA01.4.