ADR-OA04.1: Implementation Notes - MVP Build-Out Sequence¶

Implementation-guide addendum for OA04, defining expected operational flow and the incremental MVP build sequence.

Status: WIP
Type: Implementation Guide
Date: 2026-02-10
Owner: Aaron Solomon
Author: Aaron Solomon, GPT 5.3, GPT-5 Codex, Claude Code
Parent ADR: ADR-OA04

ADR Editing Policy¶

IMPORTANT: How you edit this ADR depends on its status.

proposed status: ADR is in the design loop. We may rewrite or edit the document as needed to refine the design.
accepted, wip, implemented status: Implementation has begun or completed. NEVER edit the original Context/Decision/Consequences sections. Only append addendums.

Context¶

ADR-OA04 defines workflow schema and opcode semantics. Implementation still needs a practical build order for rapid prototyping so the team can ship a walking skeleton quickly, without over committing to late-stage capabilities.

This ADR captures execution-focused notes for:

Crisp role boundaries between tnh-gen, kernel, runners, and validators.
MVP operational loops that can run safely inside a worktree.
Ordered build-out milestones that keep risk bounded and maintain forward progress.

Decision¶

1. System Role Split¶

tnh-gen: control plane (workflow selection, context packaging, prompt routing, planner/evaluator calls via GenAIService, provenance/artifact collation, policy enforcement entrypoint).
Conductor kernel: deterministic OA04 step executor (opcodes only; no embedded domain judgment).
CLI runners (claude, codex): heavy edit performers invoked by RUN_AGENT.
Validators/harnesses: deterministic execution surfaces invoked by RUN_VALIDATION, returning structured reports + artifacts.

2. MVP Operational Loops¶

Builder loop:

RUN_AGENT (plan/design)
EVALUATE (plan quality/risk)
RUN_AGENT (implement)
RUN_VALIDATION
EVALUATE (results)
RUN_AGENT (fix) repeated as needed
GATE (final review)
STOP

Evaluator loop (generative):

RUN_AGENT (synthesize harness)
RUN_VALIDATION (run harness)
EVALUATE (harness report)
Route to fix loop or gate

3. Prompt Routing Contract¶

Heavy work prompts (design, code, refactor, fix) execute through CLI runners.
Planner/evaluator/summarizer prompts execute through tnh-gen + GenAIService under EVALUATE.
Harness execution remains deterministic and local under RUN_VALIDATION; planner consumes structured reports, not raw log streams.

4. MVP Scope (Walking Skeleton)¶

Implement first:

OA04 engine + minimal opcodes: RUN_AGENT, RUN_VALIDATION, EVALUATE, GATE, STOP.
ROLLBACK(pre_run) is REQUIRED — kernel MUST implement this in MVP. Workflows may omit rollback usage, but the capability must exist for unattended/overnight runs.
Script validator support in RUN_VALIDATION for generated harnesses.
Worktree sandbox + path/policy guardrails (no push/merge by default).

4a. Checkpoint Implementation (MVP)¶

Target	MVP Support	Description
`pre_run`	Implicit, always available	Snapshot at workflow start
`pre_step`	Optional, implicit	Snapshot before each step (if enabled)
`checkpoint:<id>`	Reserved	No explicit checkpoint authoring in MVP

Rationale: Minimizes surface area while preserving safety escape hatch.

4b. Harness Execution Sandboxing (MVP)¶

Script validators run with the following constraints:

Constraint	MVP Implementation
Worktree isolation	Harness executes in conductor worktree, not main repo
Sanitized environment	No secrets in env vars; minimal PATH
No network by default	Harness cannot make external requests (configurable override)
Entrypoint allowlist	Scripts must be under `.tnh/run/<run_id>/...`

Deferred: Container/VM isolation. Worktree + constraints are adequate for Phase 1.

4c. Artifact Retention Policy¶

Decision: Retention is system-level, not workflow-authored.
Defaults: Configured in tnh-gen settings (e.g., last N runs or last N days).
Workflow role: Workflows MAY tag artifacts as important (advisory only).
Rationale: Avoids inconsistent cleanup semantics across workflows.

5. MVP Generative Evaluation Deliverables¶

Standard run artifact directory: .tnh/run/<run_id>/....
Minimal harness runner contract:
accepts suite spec path
executes declared cases (CLI/web)
emits harness_report.json plus referenced artifacts (stdout/stderr, screenshots)
Planner output contract (MVP minimum for routing correctness): status, next_step, optional fix_instructions, blockers, risk_flags.

5a. Schema Freeze (MVP)¶

Freeze now (cross-boundary contracts):

harness_report.json minimal schema (OA04 Section 9)
Planner output object for EVALUATE (MVP minimum): {status, next_step, fix_instructions?, blockers?, risk_flags?}

Defer (internal evolution):

UX flag taxonomy and severity levels
Rich observation typing
Visual diff conventions
Accessibility report schema

Rationale: Stabilize only the contracts that cross component boundaries; internal schemas can evolve.

5c. Planner Contract Layering¶

OA01.1 defines the full planner output contract (conceptual and future-facing).
OA04 defines the subset the kernel requires for deterministic routing.
OA04.1 defines the minimum planner fields required for MVP correctness.

5b. Component Hints (Operational Guidance)¶

Workflow defaults may include optional hints:

component_kind: docs, cli, web, vscode_ui, library
eval_profile: smoke, overnight, release_candidate

Usage: Guide harness synthesis prompts only. No kernel semantics — kernel does not interpret these values.

Rationale: Improves agent reliability without adding kernel complexity.

6. Incremental Build-Out Sequence¶

Kernel + runners wired, simple workflow execution, provenance logging, worktree isolation.
Validator baseline (type, lint, unit) via RUN_VALIDATION.
Planner EVALUATE with bounded routing through allowed_next_steps.
Script validators with artifact capture.
Generative harness synthesis into .tnh/run/<run_id>/....
CLI and web harness backends (including Playwright headless screenshot capture).
Golden workflow with snapshot diffs + GATE approval before accepting updates.
Advanced later: VS Code UI automation backend, richer capture, flake detection, perf budgets, named checkpoints.

7. Guardrail Constraints¶

Kernel does not embed domain-specific evaluation logic.
All evaluation must emit structured reports; planner reads report payloads and artifact references.
Golden updates are always gated; never auto-accepted.
Fix/adjust loop is primary recovery; rollback is emergency brake and determinism aid.

Consequences¶

Positive¶

Provides a clear implementation order for rapid prototype execution.
Preserves OA01.1 separation of concerns between kernel mechanics and planner intelligence.
Enables immediate value with a small opcode/runtime surface.

Negative¶

Defers richer capabilities (UI automation, flake handling, named checkpoints).
Requires discipline to keep prompt routing and runner responsibilities clean during fast iteration.

Open Questions¶

All original open questions have been resolved.

Original Question	Resolution	Location
`ROLLBACK(pre_run)` required?	Yes, REQUIRED in MVP	Section 4: MVP Scope
Split harness synth/execution?	No; keep in one workflow for MVP	Section 5 (unchanged)
Which schemas to freeze?	`harness_report.json` + planner output now; defer internals	Section 5a: Schema Freeze

Mermaid Artifacts¶

1) Role + Control-Plane Architecture¶

flowchart TB
  U[Human Operator] -->|invokes| T[tnh-gen<br/>Control Plane]
  T -->|loads workflow + context| K[Conductor Kernel<br/>OA04 opcode runtime]
  T -->|planner/evaluator calls| M[GenAIService<br/>GPT/Claude/Gemini]
  K -->|RUN_AGENT| RC[Claude CLI Runner]
  K -->|RUN_AGENT| RX[Codex CLI Runner]
  K -->|RUN_VALIDATION| V[Validators / Harnesses<br/>pytest/mypy/playwright/scripts]
  K -->|artifacts + events| L[Ledger / Provenance<br/>run dir + structured events]
  K -->|workspace ops| G[Git Worktree Sandbox]

2) Overnight Build Sequence¶

sequenceDiagram
  autonumber
  participant H as Human (Lead)
  participant T as tnh-gen (control plane)
  participant K as Conductor Kernel
  participant C as Claude CLI (docs/design)
  participant X as Codex CLI (code)
  participant V as Validators
  participant M as GenAIService (planner/evaluator)
  participant G as Git Worktree
  participant L as Ledger/Artifacts

  H->>T: tnh-gen run workflow overnight_build
  T->>G: create worktree + branch
  T->>K: start(run_id, workflow, context)

  K->>C: RUN_AGENT(plan/design prompt)
  C-->>K: edits + transcript
  K->>L: record(step artifacts)

  K->>M: EVALUATE(plan quality/risk)
  M-->>K: {status,next_step,risk_flags}
  K->>L: record(evaluate_result)

  alt needs_human
    K->>H: GATE(plan review)
    H-->>K: approve|reject
  end

  K->>X: RUN_AGENT(implement prompt)
  X-->>K: edits + transcript
  K->>L: record(step artifacts)

  K->>V: RUN_VALIDATION(type/lint/unit)
  V-->>K: logs + report
  K->>L: record(validation artifacts)

  K->>M: EVALUATE(results)
  M-->>K: {status,next_step,fix_instructions}
  K->>L: record(evaluate_result)

  alt partial
    loop fix/validate
      K->>X: RUN_AGENT(fix-from-logs)
      X-->>K: edits
      K->>V: RUN_VALIDATION
      V-->>K: logs
      K->>M: EVALUATE
      M-->>K: status
    end
  else unsafe
    K->>K: ROLLBACK(pre_run or pre_step)
    K->>H: GATE(unsafe review)
  end

  K->>H: GATE(final review)
  H-->>K: approve|reject
  K->>L: record(run_completed)
  K->>K: STOP

3) Generative Evaluation Loop¶

flowchart LR
  A["RUN_AGENT<br/>Synthesize harness + suite<br/>(Codex/Claude)"] --> B["RUN_VALIDATION<br/>Execute harness backend<br/>(script/playwright/cli)"]
  B --> C["EVALUATE<br/>Judge harness_report.json<br/>(GenAIService planner)"]
  C -->|success| D["GATE final / STOP"]
  C -->|partial| E["RUN_AGENT<br/>Fix from report"]
  E --> B
  C -->|unsafe| R["ROLLBACK or STOP"] --> D

4) Prompt Routing Map¶

flowchart TB
  subgraph "CLI Agents (heavy edits)"
    P1[PROMPT.DESIGN.*] -->|RUN_AGENT| CLAUDE[Claude CLI]
    P2[PROMPT.CODE.*] -->|RUN_AGENT| CODEX[Codex CLI]
  end

  subgraph "tnh-gen via GenAIService (structured decisions)"
    P3[PROMPT.PLANNER.EVAL_*] -->|EVALUATE| GENAI[GenAIService<br/>GPT/Claude/Gemini]
    P4[PROMPT.SUMMARY.*] -->|EVALUATE| GENAI
  end

  subgraph "Deterministic tools"
    V[RUN_VALIDATION<br/>pytest/mypy/playwright/harness] --> RPT[harness_report.json + artifacts]
  end