ADR-OA04.1: Implementation Notes - MVP Build-Out Sequence¶
Implementation-guide addendum for OA04, defining expected operational flow and the incremental MVP build sequence.
- Status: WIP
- Type: Implementation Guide
- Date: 2026-02-10
- Owner: Aaron Solomon
- Author: Aaron Solomon, GPT 5.3, GPT-5 Codex, Claude Code
- Parent ADR: ADR-OA04
ADR Editing Policy¶
IMPORTANT: How you edit this ADR depends on its status.
proposedstatus: ADR is in the design loop. We may rewrite or edit the document as needed to refine the design.accepted,wip,implementedstatus: Implementation has begun or completed. NEVER edit the original Context/Decision/Consequences sections. Only append addendums.
Context¶
ADR-OA04 defines workflow schema and opcode semantics. Implementation still needs a practical build order for rapid prototyping so the team can ship a walking skeleton quickly, without over committing to late-stage capabilities.
This ADR captures execution-focused notes for:
- Crisp role boundaries between
tnh-gen, kernel, runners, and validators. - MVP operational loops that can run safely inside a worktree.
- Ordered build-out milestones that keep risk bounded and maintain forward progress.
Decision¶
1. System Role Split¶
tnh-gen: control plane (workflow selection, context packaging, prompt routing, planner/evaluator calls via GenAIService, provenance/artifact collation, policy enforcement entrypoint).- Conductor kernel: deterministic OA04 step executor (opcodes only; no embedded domain judgment).
- CLI runners (
claude,codex): heavy edit performers invoked byRUN_AGENT. - Validators/harnesses: deterministic execution surfaces invoked by
RUN_VALIDATION, returning structured reports + artifacts.
2. MVP Operational Loops¶
Builder loop:
RUN_AGENT(plan/design)EVALUATE(plan quality/risk)RUN_AGENT(implement)RUN_VALIDATIONEVALUATE(results)RUN_AGENT(fix) repeated as neededGATE(final review)STOP
Evaluator loop (generative):
RUN_AGENT(synthesize harness)RUN_VALIDATION(run harness)EVALUATE(harness report)- Route to fix loop or gate
3. Prompt Routing Contract¶
- Heavy work prompts (
design,code,refactor,fix) execute through CLI runners. - Planner/evaluator/summarizer prompts execute through
tnh-gen+ GenAIService underEVALUATE. - Harness execution remains deterministic and local under
RUN_VALIDATION; planner consumes structured reports, not raw log streams.
4. MVP Scope (Walking Skeleton)¶
Implement first:
- OA04 engine + minimal opcodes:
RUN_AGENT,RUN_VALIDATION,EVALUATE,GATE,STOP. ROLLBACK(pre_run)is REQUIRED — kernel MUST implement this in MVP. Workflows may omit rollback usage, but the capability must exist for unattended/overnight runs.- Script validator support in
RUN_VALIDATIONfor generated harnesses. - Worktree sandbox + path/policy guardrails (no push/merge by default).
4a. Checkpoint Implementation (MVP)¶
| Target | MVP Support | Description |
|---|---|---|
pre_run |
Implicit, always available | Snapshot at workflow start |
pre_step |
Optional, implicit | Snapshot before each step (if enabled) |
checkpoint:<id> |
Reserved | No explicit checkpoint authoring in MVP |
Rationale: Minimizes surface area while preserving safety escape hatch.
4b. Harness Execution Sandboxing (MVP)¶
Script validators run with the following constraints:
| Constraint | MVP Implementation |
|---|---|
| Worktree isolation | Harness executes in conductor worktree, not main repo |
| Sanitized environment | No secrets in env vars; minimal PATH |
| No network by default | Harness cannot make external requests (configurable override) |
| Entrypoint allowlist | Scripts must be under .tnh/run/<run_id>/... |
Deferred: Container/VM isolation. Worktree + constraints are adequate for Phase 1.
4c. Artifact Retention Policy¶
- Decision: Retention is system-level, not workflow-authored.
- Defaults: Configured in tnh-gen settings (e.g., last N runs or last N days).
- Workflow role: Workflows MAY tag artifacts as
important(advisory only). - Rationale: Avoids inconsistent cleanup semantics across workflows.
5. MVP Generative Evaluation Deliverables¶
- Standard run artifact directory:
.tnh/run/<run_id>/.... - Minimal harness runner contract:
- accepts suite spec path
- executes declared cases (CLI/web)
- emits
harness_report.jsonplus referenced artifacts (stdout/stderr, screenshots) - Planner output contract (MVP minimum for routing correctness):
status,next_step, optionalfix_instructions,blockers,risk_flags.
5a. Schema Freeze (MVP)¶
Freeze now (cross-boundary contracts):
harness_report.jsonminimal schema (OA04 Section 9)- Planner output object for EVALUATE (MVP minimum):
{status, next_step, fix_instructions?, blockers?, risk_flags?}
Defer (internal evolution):
- UX flag taxonomy and severity levels
- Rich observation typing
- Visual diff conventions
- Accessibility report schema
Rationale: Stabilize only the contracts that cross component boundaries; internal schemas can evolve.
5c. Planner Contract Layering¶
- OA01.1 defines the full planner output contract (conceptual and future-facing).
- OA04 defines the subset the kernel requires for deterministic routing.
- OA04.1 defines the minimum planner fields required for MVP correctness.
5b. Component Hints (Operational Guidance)¶
Workflow defaults may include optional hints:
component_kind:docs,cli,web,vscode_ui,libraryeval_profile:smoke,overnight,release_candidate
Usage: Guide harness synthesis prompts only. No kernel semantics — kernel does not interpret these values.
Rationale: Improves agent reliability without adding kernel complexity.
6. Incremental Build-Out Sequence¶
- Kernel + runners wired, simple workflow execution, provenance logging, worktree isolation.
- Validator baseline (
type,lint,unit) viaRUN_VALIDATION. - Planner
EVALUATEwith bounded routing throughallowed_next_steps. - Script validators with artifact capture.
- Generative harness synthesis into
.tnh/run/<run_id>/.... - CLI and web harness backends (including Playwright headless screenshot capture).
- Golden workflow with snapshot diffs +
GATEapproval before accepting updates. - Advanced later: VS Code UI automation backend, richer capture, flake detection, perf budgets, named checkpoints.
7. Guardrail Constraints¶
- Kernel does not embed domain-specific evaluation logic.
- All evaluation must emit structured reports; planner reads report payloads and artifact references.
- Golden updates are always gated; never auto-accepted.
- Fix/adjust loop is primary recovery; rollback is emergency brake and determinism aid.
Consequences¶
Positive¶
- Provides a clear implementation order for rapid prototype execution.
- Preserves OA01.1 separation of concerns between kernel mechanics and planner intelligence.
- Enables immediate value with a small opcode/runtime surface.
Negative¶
- Defers richer capabilities (UI automation, flake handling, named checkpoints).
- Requires discipline to keep prompt routing and runner responsibilities clean during fast iteration.
Open Questions¶
All original open questions have been resolved.
| Original Question | Resolution | Location |
|---|---|---|
ROLLBACK(pre_run) required? |
Yes, REQUIRED in MVP | Section 4: MVP Scope |
| Split harness synth/execution? | No; keep in one workflow for MVP | Section 5 (unchanged) |
| Which schemas to freeze? | harness_report.json + planner output now; defer internals |
Section 5a: Schema Freeze |
Mermaid Artifacts¶
1) Role + Control-Plane Architecture¶
flowchart TB
U[Human Operator] -->|invokes| T[tnh-gen<br/>Control Plane]
T -->|loads workflow + context| K[Conductor Kernel<br/>OA04 opcode runtime]
T -->|planner/evaluator calls| M[GenAIService<br/>GPT/Claude/Gemini]
K -->|RUN_AGENT| RC[Claude CLI Runner]
K -->|RUN_AGENT| RX[Codex CLI Runner]
K -->|RUN_VALIDATION| V[Validators / Harnesses<br/>pytest/mypy/playwright/scripts]
K -->|artifacts + events| L[Ledger / Provenance<br/>run dir + structured events]
K -->|workspace ops| G[Git Worktree Sandbox]
2) Overnight Build Sequence¶
sequenceDiagram
autonumber
participant H as Human (Lead)
participant T as tnh-gen (control plane)
participant K as Conductor Kernel
participant C as Claude CLI (docs/design)
participant X as Codex CLI (code)
participant V as Validators
participant M as GenAIService (planner/evaluator)
participant G as Git Worktree
participant L as Ledger/Artifacts
H->>T: tnh-gen run workflow overnight_build
T->>G: create worktree + branch
T->>K: start(run_id, workflow, context)
K->>C: RUN_AGENT(plan/design prompt)
C-->>K: edits + transcript
K->>L: record(step artifacts)
K->>M: EVALUATE(plan quality/risk)
M-->>K: {status,next_step,risk_flags}
K->>L: record(evaluate_result)
alt needs_human
K->>H: GATE(plan review)
H-->>K: approve|reject
end
K->>X: RUN_AGENT(implement prompt)
X-->>K: edits + transcript
K->>L: record(step artifacts)
K->>V: RUN_VALIDATION(type/lint/unit)
V-->>K: logs + report
K->>L: record(validation artifacts)
K->>M: EVALUATE(results)
M-->>K: {status,next_step,fix_instructions}
K->>L: record(evaluate_result)
alt partial
loop fix/validate
K->>X: RUN_AGENT(fix-from-logs)
X-->>K: edits
K->>V: RUN_VALIDATION
V-->>K: logs
K->>M: EVALUATE
M-->>K: status
end
else unsafe
K->>K: ROLLBACK(pre_run or pre_step)
K->>H: GATE(unsafe review)
end
K->>H: GATE(final review)
H-->>K: approve|reject
K->>L: record(run_completed)
K->>K: STOP
3) Generative Evaluation Loop¶
flowchart LR
A["RUN_AGENT<br/>Synthesize harness + suite<br/>(Codex/Claude)"] --> B["RUN_VALIDATION<br/>Execute harness backend<br/>(script/playwright/cli)"]
B --> C["EVALUATE<br/>Judge harness_report.json<br/>(GenAIService planner)"]
C -->|success| D["GATE final / STOP"]
C -->|partial| E["RUN_AGENT<br/>Fix from report"]
E --> B
C -->|unsafe| R["ROLLBACK or STOP"] --> D
4) Prompt Routing Map¶
flowchart TB
subgraph "CLI Agents (heavy edits)"
P1[PROMPT.DESIGN.*] -->|RUN_AGENT| CLAUDE[Claude CLI]
P2[PROMPT.CODE.*] -->|RUN_AGENT| CODEX[Codex CLI]
end
subgraph "tnh-gen via GenAIService (structured decisions)"
P3[PROMPT.PLANNER.EVAL_*] -->|EVALUATE| GENAI[GenAIService<br/>GPT/Claude/Gemini]
P4[PROMPT.SUMMARY.*] -->|EVALUATE| GENAI
end
subgraph "Deterministic tools"
V[RUN_VALIDATION<br/>pytest/mypy/playwright/harness] --> RPT[harness_report.json + artifacts]
end