Skip to content

SPIKE-06 Native Codex CLI Baseline

This experiment note records the standalone native Codex CLI baseline before orchestration comparisons were run.

Experiment ID

SPIKE-06

Question

Does the standalone native Codex CLI at /opt/homebrew/bin/codex support the baseline headless and kernel-mediated flows cleanly enough to proceed to the larger prompt-dir comparison?

Setup

Native CLI:

  • executable: /opt/homebrew/bin/codex
  • version: codex-cli 0.120.0

Baseline checks:

  • codex --version
  • codex --help
  • codex exec --help
  • direct codex exec --json --ephemeral -p collab ACK prompt
  • tnh-conductor --help
  • tnh-conductor run --help
  • focused runner/conductor test set
  • no-edit tnh-conductor ACK workflow using the native Codex executable

Primary artifacts:

  • tmp/codex-native-version.txt
  • tmp/codex-native-help.txt
  • tmp/codex-native-exec-help.txt
  • tmp/codex-native-ack-stdout.jsonl
  • tmp/codex-native-ack-stderr.log
  • tmp/tnh-conductor-help.txt
  • tmp/tnh-conductor-run-help.txt
  • tmp/codex-native-runner-tests-after-response-path-fix.log
  • tmp/codex-native-kernel-summary-after-response-path-fix.json
  • .tnh-conductor/runs/20260415T210728Z/

Result

The standalone native Codex CLI is usable for the next prompt-dir comparison after two runner fixes.

Confirmed behavior:

  • codex --version, codex --help, and codex exec --help exited cleanly with empty stderr.
  • direct headless codex exec returned valid JSONL and the expected final message ACK_NATIVE_CODEX_BASELINE.
  • direct headless codex exec still emitted plugin-manifest warnings to stderr from the user Codex home, so stdout/stderr split remains required.
  • poetry run tnh-conductor --help and poetry run tnh-conductor run --help work cleanly.
  • poetry run python -m tnh_scholar.cli_tools.tnh_conductor.tnh_conductor --help is not a useful help path in this environment; it exits with only a runpy warning.
  • the focused runner/conductor tests passed after fixes: 22 passed in 5.72s.
  • the no-edit kernel run completed through tnh-conductor using /opt/homebrew/bin/codex.
  • the final kernel ACK run returned ACK_NATIVE_CODEX_KERNEL_BASELINE.
  • the final kernel ACK run left the managed worktree clean.

Issues Found And Fixed

Forced Model Was Too Specific

The maintained Codex runner forced -m gpt-5.2-codex.

That failed under the native CLI with ChatGPT account auth:

The 'gpt-5.2-codex' model is not supported when using Codex with a ChatGPT account.

The runner now omits -m by default so Codex uses repo-local CLI configuration. Explicit model override remains supported for configured callers.

Final Response Capture Dirtied Worktrees

The maintained runner previously wrote codex-last-message.txt inside the managed worktree.

That made a no-edit run appear dirty:

?? codex-last-message.txt

The runner now captures --output-last-message in a temporary path outside the worktree and persists the final response through the normal run-artifact path.

Useful Artifacts

Most useful artifacts:

  • .tnh-conductor/runs/20260415T210728Z/artifacts/ack/runner_metadata.json
  • .tnh-conductor/runs/20260415T210728Z/artifacts/ack/transcript.ndjson
  • .tnh-conductor/runs/20260415T210728Z/artifacts/ack/final_response.txt
  • .tnh-conductor/runs/20260415T210728Z/artifacts/ack/workspace_status.json

The final workspace status shows:

{
  "is_dirty": false,
  "staged_count": 0,
  "unstaged_count": 0,
  "diff_summary": null
}

Next Action

Proceed to the prompt-dir comparison with these constraints:

  • use /opt/homebrew/bin/codex or $(command -v codex) after confirming it resolves to the native CLI
  • use poetry run tnh-conductor, not the module form
  • keep stdout/stderr split for all direct Codex calls
  • inspect managed worktree cleanliness as part of the kernel arm review