Skip to content

ADR-OA04.1: MVP Runtime Build-Out Sequence

Implementation-guide addendum for OA04, defining expected MVP runtime flow, build order, and the handoff from execution contracts to OA07 bootstrap workspace work.

  • Status: WIP
  • Type: Implementation Guide
  • Date: 2026-02-10
  • Owner: Aaron Solomon
  • Author: Aaron Solomon, GPT 5.3, GPT-5 Codex, Claude Code
  • Parent ADR: ADR-OA04

ADR Editing Policy

IMPORTANT: How you edit this ADR depends on its status.

  • proposed status: ADR is in the design loop. We may rewrite or edit the document as needed to refine the design.
  • accepted, wip, implemented status: Implementation has begun or completed. NEVER edit the original Context/Decision/Consequences sections. Only append addendums.

Context

ADR-OA04 defines workflow schema and opcode semantics. Implementation still needs a practical build order for rapid prototyping so the team can ship a walking skeleton quickly, without over committing to late-stage capabilities.

This ADR captures execution-focused notes for:

  1. Crisp role boundaries between tnh-gen, kernel, runners, and validators.
  2. MVP operational loops that can run safely inside a worktree.
  3. Ordered build-out milestones that keep risk bounded and maintain forward progress.

Decision

1. System Role Split

  • tnh-gen: control plane (workflow selection, context packaging, prompt routing, planner/evaluator calls via GenAIService, provenance/artifact collation, policy enforcement entrypoint).
  • Conductor kernel: deterministic OA04 step executor (opcodes only; no embedded domain judgment).
  • CLI runners (claude, codex): heavy edit performers invoked by RUN_AGENT.
  • Validators/harnesses: deterministic execution surfaces invoked by RUN_VALIDATION, returning structured reports + artifacts.

2. MVP Operational Loops

Builder loop:

  1. RUN_AGENT (plan/design)
  2. EVALUATE (plan quality/risk)
  3. RUN_AGENT (implement)
  4. RUN_VALIDATION
  5. EVALUATE (results)
  6. RUN_AGENT (fix) repeated as needed
  7. GATE (final review)
  8. STOP

Evaluator loop (generative):

  1. RUN_AGENT (synthesize harness)
  2. RUN_VALIDATION (run harness)
  3. EVALUATE (harness report)
  4. Route to fix loop or gate

3. Prompt Routing Contract

  • Heavy work prompts (design, code, refactor, fix) execute through CLI runners.
  • Planner/evaluator/summarizer prompts execute through tnh-gen + GenAIService under EVALUATE.
  • Harness execution remains deterministic and local under RUN_VALIDATION; planner consumes structured reports, not raw log streams.

4. MVP Scope (Walking Skeleton)

Implement first:

  • OA04 engine + minimal opcodes: RUN_AGENT, RUN_VALIDATION, EVALUATE, GATE, STOP.
  • ROLLBACK(pre_run) is REQUIRED — kernel MUST implement this in MVP. Workflows may omit rollback usage, but the capability must exist for unattended/overnight runs.
  • Script validator support in RUN_VALIDATION for generated harnesses.
  • Worktree sandbox + path/policy guardrails (no push/merge by default).

4a. Checkpoint Implementation (MVP)

Target MVP Support Description
pre_run Implicit, always available Snapshot at workflow start
pre_step Optional, implicit Snapshot before each step (if enabled)
checkpoint:<id> Reserved No explicit checkpoint authoring in MVP

Rationale: Minimizes surface area while preserving safety escape hatch.

4b. Harness Execution Sandboxing (MVP)

Script validators run with the following constraints:

Constraint MVP Implementation
Worktree isolation Harness executes in conductor worktree, not main repo
Sanitized environment No secrets in env vars; minimal PATH
No network by default Harness cannot make external requests (configurable override)
Entrypoint allowlist Scripts must be under .tnh/run/<run_id>/...

Deferred: Container/VM isolation. Worktree + constraints are adequate for Phase 1.

4c. Artifact Retention Policy

  • Decision: Retention is system-level, not workflow-authored.
  • Defaults: Configured in tnh-gen settings (e.g., last N runs or last N days).
  • Workflow role: Workflows MAY tag artifacts as important (advisory only).
  • Rationale: Avoids inconsistent cleanup semantics across workflows.

5. MVP Generative Evaluation Deliverables

  • Standard run artifact directory: .tnh/run/<run_id>/....
  • Minimal harness runner contract:
  • accepts suite spec path
  • executes declared cases (CLI/web)
  • emits harness_report.json plus referenced artifacts (stdout/stderr, screenshots)
  • Planner output contract (MVP minimum for routing correctness): status, next_step, optional fix_instructions, blockers, risk_flags.

5a. Schema Freeze (MVP)

Freeze now (cross-boundary contracts):

  • harness_report.json minimal schema (OA04 Section 9)
  • Planner output object for EVALUATE (MVP minimum): {status, next_step, fix_instructions?, blockers?, risk_flags?}

Defer (internal evolution):

  • UX flag taxonomy and severity levels
  • Rich observation typing
  • Visual diff conventions
  • Accessibility report schema

Rationale: Stabilize only the contracts that cross component boundaries; internal schemas can evolve.

5c. Planner Contract Layering

  • OA01.1 defines the full planner output contract (conceptual and future-facing).
  • OA04 defines the subset the kernel requires for deterministic routing.
  • OA04.1 defines the minimum planner fields required for MVP correctness.

5b. Component Hints (Operational Guidance)

Workflow defaults may include optional hints:

  • component_kind: docs, cli, web, vscode_ui, library
  • eval_profile: smoke, overnight, release_candidate

Usage: Guide harness synthesis prompts only. No kernel semantics — kernel does not interpret these values.

Rationale: Improves agent reliability without adding kernel complexity.

6. Incremental Build-Out Sequence

  1. Kernel + runners wired, simple workflow execution, provenance logging, worktree isolation.
  2. Validator baseline (type, lint, unit) via RUN_VALIDATION.
  3. Planner EVALUATE with bounded routing through allowed_next_steps.
  4. Script validators with artifact capture.
  5. Generative harness synthesis into .tnh/run/<run_id>/....
  6. CLI and web harness backends (including Playwright headless screenshot capture).
  7. Golden workflow with snapshot diffs + GATE approval before accepting updates.
  8. Advanced later: VS Code UI automation backend, richer capture, flake detection, perf budgets, named checkpoints.

7. Guardrail Constraints

  • Kernel does not embed domain-specific evaluation logic.
  • All evaluation must emit structured reports; planner reads report payloads and artifact references.
  • Golden updates are always gated; never auto-accepted.
  • Fix/adjust loop is primary recovery; rollback is emergency brake and determinism aid.

Consequences

Positive

  • Provides a clear implementation order for rapid prototype execution.
  • Preserves OA01.1 separation of concerns between kernel mechanics and planner intelligence.
  • Enables immediate value with a small opcode/runtime surface.

Negative

  • Defers richer capabilities (UI automation, flake handling, named checkpoints).
  • Requires discipline to keep prompt routing and runner responsibilities clean during fast iteration.

Open Questions

All original open questions have been resolved.

Original Question Resolution Location
ROLLBACK(pre_run) required? Yes, REQUIRED in MVP Section 4: MVP Scope
Split harness synth/execution? No; keep in one workflow for MVP Section 5 (unchanged)
Which schemas to freeze? harness_report.json + planner output now; defer internals Section 5a: Schema Freeze

Mermaid Artifacts

1) Role + Control-Plane Architecture

flowchart TB
  U[Human Operator] -->|invokes| T[tnh-gen<br/>Control Plane]
  T -->|loads workflow + context| K[Conductor Kernel<br/>OA04 opcode runtime]
  T -->|planner/evaluator calls| M[GenAIService<br/>GPT/Claude/Gemini]
  K -->|RUN_AGENT| RC[Claude CLI Runner]
  K -->|RUN_AGENT| RX[Codex CLI Runner]
  K -->|RUN_VALIDATION| V[Validators / Harnesses<br/>pytest/mypy/playwright/scripts]
  K -->|artifacts + events| L[Ledger / Provenance<br/>run dir + structured events]
  K -->|workspace ops| G[Git Worktree Sandbox]

2) Overnight Build Sequence

sequenceDiagram
  autonumber
  participant H as Human (Lead)
  participant T as tnh-gen (control plane)
  participant K as Conductor Kernel
  participant C as Claude CLI (docs/design)
  participant X as Codex CLI (code)
  participant V as Validators
  participant M as GenAIService (planner/evaluator)
  participant G as Git Worktree
  participant L as Ledger/Artifacts

  H->>T: tnh-gen run workflow overnight_build
  T->>G: create worktree + branch
  T->>K: start(run_id, workflow, context)

  K->>C: RUN_AGENT(plan/design prompt)
  C-->>K: edits + transcript
  K->>L: record(step artifacts)

  K->>M: EVALUATE(plan quality/risk)
  M-->>K: {status,next_step,risk_flags}
  K->>L: record(evaluate_result)

  alt needs_human
    K->>H: GATE(plan review)
    H-->>K: approve|reject
  end

  K->>X: RUN_AGENT(implement prompt)
  X-->>K: edits + transcript
  K->>L: record(step artifacts)

  K->>V: RUN_VALIDATION(type/lint/unit)
  V-->>K: logs + report
  K->>L: record(validation artifacts)

  K->>M: EVALUATE(results)
  M-->>K: {status,next_step,fix_instructions}
  K->>L: record(evaluate_result)

  alt partial
    loop fix/validate
      K->>X: RUN_AGENT(fix-from-logs)
      X-->>K: edits
      K->>V: RUN_VALIDATION
      V-->>K: logs
      K->>M: EVALUATE
      M-->>K: status
    end
  else unsafe
    K->>K: ROLLBACK(pre_run or pre_step)
    K->>H: GATE(unsafe review)
  end

  K->>H: GATE(final review)
  H-->>K: approve|reject
  K->>L: record(run_completed)
  K->>K: STOP

3) Generative Evaluation Loop

flowchart LR
  A["RUN_AGENT<br/>Synthesize harness + suite<br/>(Codex/Claude)"] --> B["RUN_VALIDATION<br/>Execute harness backend<br/>(script/playwright/cli)"]
  B --> C["EVALUATE<br/>Judge harness_report.json<br/>(GenAIService planner)"]
  C -->|success| D["GATE final / STOP"]
  C -->|partial| E["RUN_AGENT<br/>Fix from report"]
  E --> B
  C -->|unsafe| R["ROLLBACK or STOP"] --> D

4) Prompt Routing Map

flowchart TB
  subgraph "CLI Agents (heavy edits)"
    P1[PROMPT.DESIGN.*] -->|RUN_AGENT| CLAUDE[Claude CLI]
    P2[PROMPT.CODE.*] -->|RUN_AGENT| CODEX[Codex CLI]
  end

  subgraph "tnh-gen via GenAIService (structured decisions)"
    P3[PROMPT.PLANNER.EVAL_*] -->|EVALUATE| GENAI[GenAIService<br/>GPT/Claude/Gemini]
    P4[PROMPT.SUMMARY.*] -->|EVALUATE| GENAI
  end

  subgraph "Deterministic tools"
    V[RUN_VALIDATION<br/>pytest/mypy/playwright/harness] --> RPT[harness_report.json + artifacts]
  end


As-Built Notes & Addendums

Addendum 2026-03-27: OA04 Contract Family — Scaffolding Alignment Notes

Context: OA04.2–OA04.5 were accepted 2026-03-27 as the contract layer between OA07 runtime foundations and forward-path implementation. A code review against the existing OA07 scaffolding at acceptance time revealed the following alignment gaps. These are expected and pre-planned — the ADRs were written before the scaffolding was filled in. Recording them here for implementer context.

Gap 1 — RunEventRecord is too thin (OA04.3 §6)

run_artifacts/models.py RunEventRecord currently has only step_id and next_step_id. OA04.3 requires timestamp, run_id, and event_type as required fields. Addressed in PR-2.

Gap 2 — RunMetadata is too thin (OA04.3 §3)

run_artifacts/models.py RunMetadata currently has only run_id, workflow_id, started_at. OA04.3 requires workflow_version, artifacts_root, entry_step; recommended optionals include ended_at, last_step_id, termination, schema_versions. Addressed in PR-2.

Gap 3 — No StepManifest model exists (OA04.3 §4–5)

run_artifacts/ has no StepManifest, ArtifactRole, or StepArtifactEntry types. FilesystemRunArtifactStore creates the run directory and event log correctly, but has no manifest-writing capability. Addressed in PR-2.

Gap 4 — PromptInteractionPolicy stub conflicts with OA04.4 ExecutionPolicy (OA04.4 §1)

runners/models.py has a minimal PromptInteractionPolicy(auto_approve: bool) stub on RunnerTaskRequest. OA04.4 defines a richer five-dimension ExecutionPolicy model. These must not share a name or conflate roles. The stub should be retired and RunnerTaskRequest.prompt_interaction_policy re-pointed to the proper ExecutionPolicy reference. Addressed in PR-3.

Gap 5 — validation/ package is empty (OA04.5)

agent_orchestration/validation/__init__.py exists but contains no models, protocols, or backends. The harness backend contract (backend family enum, request/result boundary, script backend) is entirely greenfield. Addressed in PR-5.

Gap 6 — runners/ has no adapters subdirectory (OA04.2 §6)

runners/ has models.py and protocols.py but no adapters/ subpackage. Claude CLI and Codex CLI adapter implementations are fully absent. Addressed in PR-4.

Note on final_state_path naming: RunArtifactPaths.final_state_path maps to final_state.txt in the current store, but OA04.3 §2 uses final-state.txt (hyphen). Minor — align during PR-2 if it does not break existing tests; otherwise note as a follow-up.

Addendum 2026-04-05: Bootstrap-First Runtime Priority

Context: The near-term product goal is operational bootstrap, not contract completeness in isolation. The maintained OA04.x runtime foundation is real and tested, but the fastest path to value is one end-to-end mutable workflow that can safely create a worktree, run an agent, validate, commit, push, and open or update a PR.

Decision: Bootstrap implementation priority is clarified as follows:

  1. Land a real maintained workspace service that creates a dedicated git worktree and records base_ref / base_sha.
  2. Separate mutable worktree execution from the canonical run directory.
  3. Use that worktree boundary to implement ROLLBACK(pre_run) for unattended runs.
  4. Land one headless bootstrap entry point that can drive a single workflow end to end.
  5. Allow bootstrap agent authority to extend through commit, push, PR creation, PR update, and review-follow-up on the managed work branch, while keeping protected-branch merge human-only.

Bootstrap completion criteria:

  • a managed worktree is created from a committed base ref,
  • RUN_AGENT and RUN_VALIDATION execute against the worktree root,
  • canonical run artifacts and provenance are written under the run directory,
  • the run can commit and push its work branch,
  • the run can open or update a PR,
  • ROLLBACK(pre_run) restores the managed branch/worktree to the recorded base state.

Deferrals to keep bootstrap fast:

  • strict OA05 workflow-to-prompt compile validation as a bootstrap blocker,
  • full OA06 fixture/vector coverage beyond the bootstrap decision path,
  • non-script harness backends,
  • stacked PR orchestration,
  • multi-agent mutable collaboration inside one worktree,
  • pre_step rollback and named checkpoints.

Rationale: The maintained runtime should become operational before the broader prompt-program surface is fully frozen. Once the system can safely create review-ready PRs inside isolated worktrees, OA05 and OA06 integration work can proceed against a live bootstrap loop rather than against abstract scaffolding.

Implementation Changes: Documentation only. Added ADR-OA07 and ADR-OA07.1 to freeze the missing workspace-safety contract.

References:

Addendum 2026-04-06: OA04.1 to OA07.1 Handoff Clarification

Context: The remaining bootstrap blocker is no longer in the OA04.x contract layer itself. The maintained runtime now has substantial execution contracts, but still lacks the mutable workspace boundary needed for operational use.

Decision: OA04.1 is clarified as the MVP runtime build-order guide, not the sole home for every remaining bootstrap implementation detail.

The current handoff is:

  • OA04.x freezes the runtime execution contracts,
  • OA04.1 records build order and reprioritization,
  • OA07/OA07.1 freeze the worktree, rollback, and bootstrap safety model required to make the runtime operational.

Rationale: This makes developer navigation simpler. Contributors should not look for worktree semantics in the OA04 decimal family when those decisions now live in OA07.x.

Implementation Changes: Documentation only. Clarified the family handoff and bootstrap dependency.

Addendum 2026-04-06: Runtime Bootstrap MVP Clarification

Context: The bootstrap-first addendum above described the end goal as safely creating review-ready PRs inside isolated worktrees. That remains the broader OA07 authority envelope, but it is too aggressive as the minimum bar for the first operational runtime milestone.

Decision: The first runtime bootstrap milestone is clarified as:

  • real managed worktree creation from committed base_ref / base_sha,
  • mutable-step execution against the worktree root,
  • canonical artifacts and provenance written to the run directory,
  • ROLLBACK(pre_run) by restoring the managed worktree to recorded base state,
  • one maintained local/headless entry point.

Commit/push/PR automation is treated as the immediate OA07 follow-on after that runtime loop exists, not as a blocker for the first operational bootstrap slice.

Rationale: This preserves the bootstrap-fast path without weakening the structural target. The system becomes operational as soon as it can run a real isolated mutable workflow safely; review automation can then build directly on the same worktree and provenance boundary.

Implementation Changes: Documentation only. Clarified the minimum bootstrap milestone and its relationship to later review automation.