Run Supervisory Shell Trial¶

Operator note for the first shell-launched supervisory Codex experiment.

Files¶

Workflow contract: docs/architecture/agent-orchestration/supervisory-shell-trial/supervisory-team-workflow-contract.md
Current task brief: docs/architecture/agent-orchestration/supervisory-shell-trial/current-supervisory-task-brief.md

Prompt Shape¶

The supervisor prompt should stay simple:

read the workflow contract
read the current task brief
act as a supervisor using native subagents
do not do substantive task work directly
use no more than 5 subagent calls
stop when the brief is sufficiently addressed or clearly blocked

Suggested Command¶

Run from the repo root in your shell:

'/Users/phapman/.vscode/extensions/openai.chatgpt-26.406.31014-darwin-arm64/bin/macos-aarch64/codex' exec \
  --json \
  --ephemeral \
  -p collab \
  "Read docs/architecture/agent-orchestration/supervisory-shell-trial/supervisory-team-workflow-contract.md and docs/architecture/agent-orchestration/supervisory-shell-trial/current-supervisory-task-brief.md. Follow them precisely. Use native subagents for the substantive work. Do not do the substantive evaluation directly yourself. Use no more than 10 subagent calls." \
  2> >(tee tmp/supervisory-shell-trial-stderr.log >&2) \
  | tee tmp/supervisory-shell-trial-stdout.jsonl

This form preserves real-time terminal output while still capturing both streams to files.

What To Inspect¶

whether native subagents were actually used
whether the supervisor kept to the supervisory role
the quality of the final synthesis
whether the result appears stronger than a plausible direct single-agent pass
the number and shape of collab_tool_call events in stdout
whether stderr noise materially interfered with understanding the run

Success Criteria¶

The trial counts as useful if:

the supervisor clearly delegates substantive work,
at least one native subagent workstream completes,
the final synthesis is more useful than a single direct critique pass would likely have been,
and the run leaves enough observable evidence to guide the next experiment.