ADR-OA04.1: MVP Runtime Build-Out Sequence¶
Implementation-guide addendum for OA04, defining expected MVP runtime flow, build order, and the handoff from execution contracts to OA07 bootstrap workspace work.
- Status: WIP
- Type: Implementation Guide
- Date: 2026-02-10
- Owner: Aaron Solomon
- Author: Aaron Solomon, GPT 5.3, GPT-5 Codex, Claude Code
- Parent ADR: ADR-OA04
ADR Editing Policy¶
IMPORTANT: How you edit this ADR depends on its status.
proposedstatus: ADR is in the design loop. We may rewrite or edit the document as needed to refine the design.accepted,wip,implementedstatus: Implementation has begun or completed. NEVER edit the original Context/Decision/Consequences sections. Only append addendums.
Context¶
ADR-OA04 defines workflow schema and opcode semantics. Implementation still needs a practical build order for rapid prototyping so the team can ship a walking skeleton quickly, without over committing to late-stage capabilities.
This ADR captures execution-focused notes for:
- Crisp role boundaries between
tnh-gen, kernel, runners, and validators. - MVP operational loops that can run safely inside a worktree.
- Ordered build-out milestones that keep risk bounded and maintain forward progress.
Decision¶
1. System Role Split¶
tnh-gen: control plane (workflow selection, context packaging, prompt routing, planner/evaluator calls via GenAIService, provenance/artifact collation, policy enforcement entrypoint).- Conductor kernel: deterministic OA04 step executor (opcodes only; no embedded domain judgment).
- CLI runners (
claude,codex): heavy edit performers invoked byRUN_AGENT. - Validators/harnesses: deterministic execution surfaces invoked by
RUN_VALIDATION, returning structured reports + artifacts.
2. MVP Operational Loops¶
Builder loop:
RUN_AGENT(plan/design)EVALUATE(plan quality/risk)RUN_AGENT(implement)RUN_VALIDATIONEVALUATE(results)RUN_AGENT(fix) repeated as neededGATE(final review)STOP
Evaluator loop (generative):
RUN_AGENT(synthesize harness)RUN_VALIDATION(run harness)EVALUATE(harness report)- Route to fix loop or gate
3. Prompt Routing Contract¶
- Heavy work prompts (
design,code,refactor,fix) execute through CLI runners. - Planner/evaluator/summarizer prompts execute through
tnh-gen+ GenAIService underEVALUATE. - Harness execution remains deterministic and local under
RUN_VALIDATION; planner consumes structured reports, not raw log streams.
4. MVP Scope (Walking Skeleton)¶
Implement first:
- OA04 engine + minimal opcodes:
RUN_AGENT,RUN_VALIDATION,EVALUATE,GATE,STOP. ROLLBACK(pre_run)is REQUIRED — kernel MUST implement this in MVP. Workflows may omit rollback usage, but the capability must exist for unattended/overnight runs.- Script validator support in
RUN_VALIDATIONfor generated harnesses. - Worktree sandbox + path/policy guardrails (no push/merge by default).
4a. Checkpoint Implementation (MVP)¶
| Target | MVP Support | Description |
|---|---|---|
pre_run |
Implicit, always available | Snapshot at workflow start |
pre_step |
Optional, implicit | Snapshot before each step (if enabled) |
checkpoint:<id> |
Reserved | No explicit checkpoint authoring in MVP |
Rationale: Minimizes surface area while preserving safety escape hatch.
4b. Harness Execution Sandboxing (MVP)¶
Script validators run with the following constraints:
| Constraint | MVP Implementation |
|---|---|
| Worktree isolation | Harness executes in conductor worktree, not main repo |
| Sanitized environment | No secrets in env vars; minimal PATH |
| No network by default | Harness cannot make external requests (configurable override) |
| Entrypoint allowlist | Scripts must be under .tnh/run/<run_id>/... |
Deferred: Container/VM isolation. Worktree + constraints are adequate for Phase 1.
4c. Artifact Retention Policy¶
- Decision: Retention is system-level, not workflow-authored.
- Defaults: Configured in tnh-gen settings (e.g., last N runs or last N days).
- Workflow role: Workflows MAY tag artifacts as
important(advisory only). - Rationale: Avoids inconsistent cleanup semantics across workflows.
5. MVP Generative Evaluation Deliverables¶
- Standard run artifact directory:
.tnh/run/<run_id>/.... - Minimal harness runner contract:
- accepts suite spec path
- executes declared cases (CLI/web)
- emits
harness_report.jsonplus referenced artifacts (stdout/stderr, screenshots) - Planner output contract (MVP minimum for routing correctness):
status,next_step, optionalfix_instructions,blockers,risk_flags.
5a. Schema Freeze (MVP)¶
Freeze now (cross-boundary contracts):
harness_report.jsonminimal schema (OA04 Section 9)- Planner output object for EVALUATE (MVP minimum):
{status, next_step, fix_instructions?, blockers?, risk_flags?}
Defer (internal evolution):
- UX flag taxonomy and severity levels
- Rich observation typing
- Visual diff conventions
- Accessibility report schema
Rationale: Stabilize only the contracts that cross component boundaries; internal schemas can evolve.
5c. Planner Contract Layering¶
- OA01.1 defines the full planner output contract (conceptual and future-facing).
- OA04 defines the subset the kernel requires for deterministic routing.
- OA04.1 defines the minimum planner fields required for MVP correctness.
5b. Component Hints (Operational Guidance)¶
Workflow defaults may include optional hints:
component_kind:docs,cli,web,vscode_ui,libraryeval_profile:smoke,overnight,release_candidate
Usage: Guide harness synthesis prompts only. No kernel semantics — kernel does not interpret these values.
Rationale: Improves agent reliability without adding kernel complexity.
6. Incremental Build-Out Sequence¶
- Kernel + runners wired, simple workflow execution, provenance logging, worktree isolation.
- Validator baseline (
type,lint,unit) viaRUN_VALIDATION. - Planner
EVALUATEwith bounded routing throughallowed_next_steps. - Script validators with artifact capture.
- Generative harness synthesis into
.tnh/run/<run_id>/.... - CLI and web harness backends (including Playwright headless screenshot capture).
- Golden workflow with snapshot diffs +
GATEapproval before accepting updates. - Advanced later: VS Code UI automation backend, richer capture, flake detection, perf budgets, named checkpoints.
7. Guardrail Constraints¶
- Kernel does not embed domain-specific evaluation logic.
- All evaluation must emit structured reports; planner reads report payloads and artifact references.
- Golden updates are always gated; never auto-accepted.
- Fix/adjust loop is primary recovery; rollback is emergency brake and determinism aid.
Consequences¶
Positive¶
- Provides a clear implementation order for rapid prototype execution.
- Preserves OA01.1 separation of concerns between kernel mechanics and planner intelligence.
- Enables immediate value with a small opcode/runtime surface.
Negative¶
- Defers richer capabilities (UI automation, flake handling, named checkpoints).
- Requires discipline to keep prompt routing and runner responsibilities clean during fast iteration.
Open Questions¶
All original open questions have been resolved.
| Original Question | Resolution | Location |
|---|---|---|
ROLLBACK(pre_run) required? |
Yes, REQUIRED in MVP | Section 4: MVP Scope |
| Split harness synth/execution? | No; keep in one workflow for MVP | Section 5 (unchanged) |
| Which schemas to freeze? | harness_report.json + planner output now; defer internals |
Section 5a: Schema Freeze |
Mermaid Artifacts¶
1) Role + Control-Plane Architecture¶
flowchart TB
U[Human Operator] -->|invokes| T[tnh-gen<br/>Control Plane]
T -->|loads workflow + context| K[Conductor Kernel<br/>OA04 opcode runtime]
T -->|planner/evaluator calls| M[GenAIService<br/>GPT/Claude/Gemini]
K -->|RUN_AGENT| RC[Claude CLI Runner]
K -->|RUN_AGENT| RX[Codex CLI Runner]
K -->|RUN_VALIDATION| V[Validators / Harnesses<br/>pytest/mypy/playwright/scripts]
K -->|artifacts + events| L[Ledger / Provenance<br/>run dir + structured events]
K -->|workspace ops| G[Git Worktree Sandbox]
2) Overnight Build Sequence¶
sequenceDiagram
autonumber
participant H as Human (Lead)
participant T as tnh-gen (control plane)
participant K as Conductor Kernel
participant C as Claude CLI (docs/design)
participant X as Codex CLI (code)
participant V as Validators
participant M as GenAIService (planner/evaluator)
participant G as Git Worktree
participant L as Ledger/Artifacts
H->>T: tnh-gen run workflow overnight_build
T->>G: create worktree + branch
T->>K: start(run_id, workflow, context)
K->>C: RUN_AGENT(plan/design prompt)
C-->>K: edits + transcript
K->>L: record(step artifacts)
K->>M: EVALUATE(plan quality/risk)
M-->>K: {status,next_step,risk_flags}
K->>L: record(evaluate_result)
alt needs_human
K->>H: GATE(plan review)
H-->>K: approve|reject
end
K->>X: RUN_AGENT(implement prompt)
X-->>K: edits + transcript
K->>L: record(step artifacts)
K->>V: RUN_VALIDATION(type/lint/unit)
V-->>K: logs + report
K->>L: record(validation artifacts)
K->>M: EVALUATE(results)
M-->>K: {status,next_step,fix_instructions}
K->>L: record(evaluate_result)
alt partial
loop fix/validate
K->>X: RUN_AGENT(fix-from-logs)
X-->>K: edits
K->>V: RUN_VALIDATION
V-->>K: logs
K->>M: EVALUATE
M-->>K: status
end
else unsafe
K->>K: ROLLBACK(pre_run or pre_step)
K->>H: GATE(unsafe review)
end
K->>H: GATE(final review)
H-->>K: approve|reject
K->>L: record(run_completed)
K->>K: STOP
3) Generative Evaluation Loop¶
flowchart LR
A["RUN_AGENT<br/>Synthesize harness + suite<br/>(Codex/Claude)"] --> B["RUN_VALIDATION<br/>Execute harness backend<br/>(script/playwright/cli)"]
B --> C["EVALUATE<br/>Judge harness_report.json<br/>(GenAIService planner)"]
C -->|success| D["GATE final / STOP"]
C -->|partial| E["RUN_AGENT<br/>Fix from report"]
E --> B
C -->|unsafe| R["ROLLBACK or STOP"] --> D
4) Prompt Routing Map¶
flowchart TB
subgraph "CLI Agents (heavy edits)"
P1[PROMPT.DESIGN.*] -->|RUN_AGENT| CLAUDE[Claude CLI]
P2[PROMPT.CODE.*] -->|RUN_AGENT| CODEX[Codex CLI]
end
subgraph "tnh-gen via GenAIService (structured decisions)"
P3[PROMPT.PLANNER.EVAL_*] -->|EVALUATE| GENAI[GenAIService<br/>GPT/Claude/Gemini]
P4[PROMPT.SUMMARY.*] -->|EVALUATE| GENAI
end
subgraph "Deterministic tools"
V[RUN_VALIDATION<br/>pytest/mypy/playwright/harness] --> RPT[harness_report.json + artifacts]
end
Related ADRs¶
- ADR-OA01.1: Conductor Strategy v2
- ADR-OA04: Workflow Execution Contracts
- ADR-OA05: Prompt Library Specification
- ADR-OA06: Planner Evaluator Contract
As-Built Notes & Addendums¶
Addendum 2026-03-27: OA04 Contract Family — Scaffolding Alignment Notes¶
Context: OA04.2–OA04.5 were accepted 2026-03-27 as the contract layer between OA07 runtime foundations and forward-path implementation. A code review against the existing OA07 scaffolding at acceptance time revealed the following alignment gaps. These are expected and pre-planned — the ADRs were written before the scaffolding was filled in. Recording them here for implementer context.
Gap 1 — RunEventRecord is too thin (OA04.3 §6)
run_artifacts/models.py RunEventRecord currently has only step_id and next_step_id. OA04.3 requires timestamp, run_id, and event_type as required fields. Addressed in PR-2.
Gap 2 — RunMetadata is too thin (OA04.3 §3)
run_artifacts/models.py RunMetadata currently has only run_id, workflow_id, started_at. OA04.3 requires workflow_version, artifacts_root, entry_step; recommended optionals include ended_at, last_step_id, termination, schema_versions. Addressed in PR-2.
Gap 3 — No StepManifest model exists (OA04.3 §4–5)
run_artifacts/ has no StepManifest, ArtifactRole, or StepArtifactEntry types. FilesystemRunArtifactStore creates the run directory and event log correctly, but has no manifest-writing capability. Addressed in PR-2.
Gap 4 — PromptInteractionPolicy stub conflicts with OA04.4 ExecutionPolicy (OA04.4 §1)
runners/models.py has a minimal PromptInteractionPolicy(auto_approve: bool) stub on RunnerTaskRequest. OA04.4 defines a richer five-dimension ExecutionPolicy model. These must not share a name or conflate roles. The stub should be retired and RunnerTaskRequest.prompt_interaction_policy re-pointed to the proper ExecutionPolicy reference. Addressed in PR-3.
Gap 5 — validation/ package is empty (OA04.5)
agent_orchestration/validation/__init__.py exists but contains no models, protocols, or backends. The harness backend contract (backend family enum, request/result boundary, script backend) is entirely greenfield. Addressed in PR-5.
Gap 6 — runners/ has no adapters subdirectory (OA04.2 §6)
runners/ has models.py and protocols.py but no adapters/ subpackage. Claude CLI and Codex CLI adapter implementations are fully absent. Addressed in PR-4.
Note on final_state_path naming: RunArtifactPaths.final_state_path maps to final_state.txt in the current store, but OA04.3 §2 uses final-state.txt (hyphen). Minor — align during PR-2 if it does not break existing tests; otherwise note as a follow-up.
Addendum 2026-04-05: Bootstrap-First Runtime Priority¶
Context: The near-term product goal is operational bootstrap, not contract completeness in isolation. The maintained OA04.x runtime foundation is real and tested, but the fastest path to value is one end-to-end mutable workflow that can safely create a worktree, run an agent, validate, commit, push, and open or update a PR.
Decision: Bootstrap implementation priority is clarified as follows:
- Land a real maintained workspace service that creates a dedicated git worktree and records
base_ref/base_sha. - Separate mutable worktree execution from the canonical run directory.
- Use that worktree boundary to implement
ROLLBACK(pre_run)for unattended runs. - Land one headless bootstrap entry point that can drive a single workflow end to end.
- Allow bootstrap agent authority to extend through commit, push, PR creation, PR update, and review-follow-up on the managed work branch, while keeping protected-branch merge human-only.
Bootstrap completion criteria:
- a managed worktree is created from a committed base ref,
RUN_AGENTandRUN_VALIDATIONexecute against the worktree root,- canonical run artifacts and provenance are written under the run directory,
- the run can commit and push its work branch,
- the run can open or update a PR,
ROLLBACK(pre_run)restores the managed branch/worktree to the recorded base state.
Deferrals to keep bootstrap fast:
- strict OA05 workflow-to-prompt compile validation as a bootstrap blocker,
- full OA06 fixture/vector coverage beyond the bootstrap decision path,
- non-script harness backends,
- stacked PR orchestration,
- multi-agent mutable collaboration inside one worktree,
pre_steprollback and named checkpoints.
Rationale: The maintained runtime should become operational before the broader prompt-program surface is fully frozen. Once the system can safely create review-ready PRs inside isolated worktrees, OA05 and OA06 integration work can proceed against a live bootstrap loop rather than against abstract scaffolding.
Implementation Changes: Documentation only. Added ADR-OA07 and ADR-OA07.1 to freeze the missing workspace-safety contract.
References:
- ADR-OA04.2: Runner Contract
- ADR-OA04.3: Provenance and Run-Artifact Contract
- ADR-OA04.4: Policy Enforcement Contract
- ADR-OA07: Diff-Policy + Safety Rails
- ADR-OA07.1: Worktree Lifecycle and Rollback
Addendum 2026-04-06: OA04.1 to OA07.1 Handoff Clarification¶
Context: The remaining bootstrap blocker is no longer in the OA04.x contract layer itself. The maintained runtime now has substantial execution contracts, but still lacks the mutable workspace boundary needed for operational use.
Decision: OA04.1 is clarified as the MVP runtime build-order guide, not the sole home for every remaining bootstrap implementation detail.
The current handoff is:
- OA04.x freezes the runtime execution contracts,
- OA04.1 records build order and reprioritization,
- OA07/OA07.1 freeze the worktree, rollback, and bootstrap safety model required to make the runtime operational.
Rationale: This makes developer navigation simpler. Contributors should not look for worktree semantics in the OA04 decimal family when those decisions now live in OA07.x.
Implementation Changes: Documentation only. Clarified the family handoff and bootstrap dependency.
Addendum 2026-04-06: Runtime Bootstrap MVP Clarification¶
Context: The bootstrap-first addendum above described the end goal as safely creating review-ready PRs inside isolated worktrees. That remains the broader OA07 authority envelope, but it is too aggressive as the minimum bar for the first operational runtime milestone.
Decision: The first runtime bootstrap milestone is clarified as:
- real managed worktree creation from committed
base_ref/base_sha, - mutable-step execution against the worktree root,
- canonical artifacts and provenance written to the run directory,
ROLLBACK(pre_run)by restoring the managed worktree to recorded base state,- one maintained local/headless entry point.
Commit/push/PR automation is treated as the immediate OA07 follow-on after that runtime loop exists, not as a blocker for the first operational bootstrap slice.
Rationale: This preserves the bootstrap-fast path without weakening the structural target. The system becomes operational as soon as it can run a real isolated mutable workflow safely; review automation can then build directly on the same worktree and provenance boundary.
Implementation Changes: Documentation only. Clarified the minimum bootstrap milestone and its relationship to later review automation.