Run Supervisory Shell Trial¶
Operator note for the first shell-launched supervisory Codex experiment.
Files¶
- Workflow contract:
docs/architecture/agent-orchestration/supervisory-shell-trial/supervisory-team-workflow-contract.md - Current task brief:
docs/architecture/agent-orchestration/supervisory-shell-trial/current-supervisory-task-brief.md
Prompt Shape¶
The supervisor prompt should stay simple:
- read the workflow contract
- read the current task brief
- act as a supervisor using native subagents
- do not do substantive task work directly
- use no more than 5 subagent calls
- stop when the brief is sufficiently addressed or clearly blocked
Suggested Command¶
Run from the repo root in your shell:
'/Users/phapman/.vscode/extensions/openai.chatgpt-26.406.31014-darwin-arm64/bin/macos-aarch64/codex' exec \
--json \
--ephemeral \
-p collab \
"Read docs/architecture/agent-orchestration/supervisory-shell-trial/supervisory-team-workflow-contract.md and docs/architecture/agent-orchestration/supervisory-shell-trial/current-supervisory-task-brief.md. Follow them precisely. Use native subagents for the substantive work. Do not do the substantive evaluation directly yourself. Use no more than 10 subagent calls." \
2> >(tee tmp/supervisory-shell-trial-stderr.log >&2) \
| tee tmp/supervisory-shell-trial-stdout.jsonl
This form preserves real-time terminal output while still capturing both streams to files.
What To Inspect¶
- whether native subagents were actually used
- whether the supervisor kept to the supervisory role
- the quality of the final synthesis
- whether the result appears stronger than a plausible direct single-agent pass
- the number and shape of
collab_tool_callevents instdout - whether
stderrnoise materially interfered with understanding the run
Success Criteria¶
The trial counts as useful if:
- the supervisor clearly delegates substantive work,
- at least one native subagent workstream completes,
- the final synthesis is more useful than a single direct critique pass would likely have been,
- and the run leaves enough observable evidence to guide the next experiment.