OA01.x Spike Experiment Register¶
This register tracks the active OA01.x spike experiments and the narrow questions each maintained run is meant to answer.
Purpose¶
This is the maintained experiment register for the current OA01.x spike.
Scope is intentionally narrow:
- headless execution patterns,
- native subagent invocation viability,
- simple supervisory comparisons,
- and the minimum artifacts needed to run and review those experiments.
Out of scope for now:
- economics,
- broad topology programs,
- large benchmark programs,
- cross-provider portfolio planning,
- and long-horizon overnight scale architecture.
Cloud-generated strategy docs are preserved separately as experimental artifacts. They are not workflow authority.
Lightweight Experiment Card¶
Each experiment note should record only:
- experiment ID,
- question,
- setup,
- result,
- useful artifacts,
- next action.
Maintained Experiments¶
SPIKE-01 Headless Baseline¶
Question:
Can codex exec be used headlessly in a repeatable enough way to support simple collaborator-style exchanges?
Status:
- completed
Primary references:
SPIKE-02 Execution Context Comparison¶
Question:
Which execution context is the most reliable and least noisy for headless use?
Focus:
- direct user shell,
- agent-launched shell,
- wrapper path,
- repo-local config effects.
Status:
- completed
Primary references:
SPIKE-03 Native Subagent Smoke Test¶
Question:
Can native subagent behavior be invoked, observed, and captured clearly enough in headless mode to support the spike?
Focus:
- spawn evidence,
- wait evidence,
- retained artifacts,
- explicit distinction between documented capability and observed behavior.
Status:
- completed
Primary references:
SPIKE-04 Narrow Supervisory Comparison¶
Question:
Does a tightly bounded supervisor plus at most two subagent calls produce a meaningfully better result than a direct single-agent pass on the same task?
Focus:
- same task,
- same files,
- same time budget,
- same output scorecard.
Status:
- completed
Primary references:
SPIKE-05 Minimum Review Artifact Set¶
Question:
What is the smallest artifact bundle that still lets a human understand what happened and judge whether the run was useful?
Focus:
- final response,
- concise event evidence,
- prompt or task brief,
- stderr or failure evidence when relevant,
- one short operator summary.
Status:
- completed
Primary references:
SPIKE-06 Native Codex CLI Baseline¶
Question:
Does the standalone native Codex CLI support the baseline headless and kernel-mediated flows cleanly enough to proceed to the larger prompt-dir comparison?
Focus:
- native CLI version and help surface
- direct
codex execJSONL smoke test - maintained runner/conductor tests
- no-edit
tnh-conductorACK workflow - managed worktree cleanliness
Status:
- completed
Primary references:
SPIKE-07 Codex Home State Dependency¶
Question:
What HOME-scoped Codex state is required for a successful scripted headless invocation, and which state only affects startup noise?
Focus:
- minimum viable
~/.codexcontent - config profile dependency
- auth dependency
- plugin cache dependency
- distinction between success-critical state and noise-inducing state
Status:
- completed
Primary references:
SPIKE-08 Launch Context Environment Contamination¶
Question:
When Codex launches another Codex process, is noisy startup behavior caused mainly by PTY shape or by inherited execution environment contamination?
Focus:
- PTY versus non-PTY launch comparison
- inherited environment versus curated user-like environment
- real Terminal.app user-shell baseline
- practical clean-launch policy for Codex-on-Codex runs
Status:
- completed
Primary references:
SPIKE-09 Prompt Dir Three-Arm Comparison¶
Question:
On the same bounded implementation task, how do direct Codex, supervisory Codex, and kernel-mediated orchestration compare in practical usefulness and behavior?
Focus:
- same task brief across all three arms
- same sanitized launch surface for shell-based Codex runs
- direct versus supervisor-with-subagents versus conductor-managed execution
- overlap, uniqueness, validation quality, and coordination overhead
Status:
- completed
Primary references:
Current Recommendation¶
Run these experiments sequentially and keep the documentation light.
The immediate goal is not to design a mature orchestration program. The immediate goal is to decide whether:
- headless agent communication is easy enough,
- native subagent use is viable enough,
- a simple supervisory pattern is actually better than working directly within existing agent bounds,
- the scripted Codex launch surface is understood well enough to avoid chasing false shell-state explanations,
- and launched Codex runs should use a curated user-like environment rather than inherited parent agent environment.