Skip to content

TNH Scholar TODO List

Roadmap tracking the highest-priority TNH Scholar tasks and release blockers.

Last Updated: 2026-04-10 (PR-8 merged; bootstrap-proof slice next) Version: 0.3.1 (Alpha) Status: Active Development - Bootstrap path complete, production hardening phase

Style Note: Tasks use descriptive headers (not numbered items) to avoid renumbering churn when reorganizing. Use #### (h4) for task headers within priority sections.


Progress Summary

Bootstrap Path Status: โœ… COMPLETE โ€” VS Code integration working, AI-assisted development enabled.

Next Steps:

  1. ๐Ÿ”ฎ JVB VS Code Parallel Viewer (P1, design phase) โ€” ADR-JVB02 strategy + UI-UX design
  2. ๐Ÿ”ฎ Finish yt-dlp reliability suite + monthly ops trigger (P1, reliability)
  3. ๐Ÿ”ฎ Finish ytt-fetch robustness hardening (P1, reliability)
  4. ๐Ÿ”ฎ Add --prompt-dir Global Flag to tnh-gen (P1, minor)
  5. ๐Ÿšง GenAIService Final Polish - promote policy_applied typing (P1, minor)
  6. ๐Ÿšง Prompt Catalog Safety - error handling, validation (P2, critical infrastructure)
  7. ๐Ÿšง Knowledge Base Implementation (P2, design complete)
  8. ๐Ÿšง Expand Test Coverage to 50%+ (P2)

For completed items: See Archive section at end.


Priority Roadmap

This section organizes work into three priority levels based on criticality for production readiness.

Priority 1: VS Code Integration Enablement (Bootstrap Path)

Goal: Enable AI-assisted development of TNH Scholar itself via VS Code extension. Prioritizes foundational work for tnh-gen + extension integration.

Status: Foundation Complete (tnh-gen CLI โœ…, Registry System โœ…)

โœ… tnh-gen CLI Implementation โ€” See Archive

โœ… File-Based Registry System (ADR-A14) โ€” See Archive

โœ… VS Code Extension Walking Skeleton โ€” See Archive

โœ… Patternโ†’Prompt Migration โ€” See Archive

โœ… Provenance Format Refactor (YAML Frontmatter) โ€” See Archive

๐Ÿšจ Agent-Orch OA07 Runtime Implementation Sequence

  • Status: IN PROGRESS - maintained execution/validation/kernel slice landed and tested
  • Priority: HIGH (foundation work for durable MVP)
  • Context: The accepted OA07 ADR set defines the maintained runtime architecture. The current conductor_mvp/ and spike/ code remains useful as migration source/reference, but should not receive forward-path feature growth.
  • Why This Matters:
  • current implementation readiness is medium, but in-place extension readiness is low
  • the highest-risk boundary is still subprocess execution and typed validation/runner contracts
  • coding should proceed by subsystem extraction, not by continuing prototype package growth
  • Implementation Order:
  • Build agent_orchestration/execution/
    • typed invocation families
    • cwd/env/timeout policy
    • termination/result taxonomy
    • final argv rendering boundary
  • Build agent_orchestration/validation/ on top of execution/
    • preserve OA04 external YAML compatibility by normalizing source shapes into typed internal models
    • migrate behavior out of conductor_mvp/providers/validation_runner.py
  • Extract agent_orchestration/kernel/
    • WorkflowCatalog
    • WorkflowValidator
    • KernelState
    • KernelRunService
  • Introduce agent_orchestration/workspace/ and agent_orchestration/run_artifacts/
    • move rollback/state capture and durable run record ownership out of prototype packages
  • Migrate maintained runner behavior into agent_orchestration/runners/
    • use reference/spike/ only as reference material
    • no new forward-path runner work in spike code
  • Current Slice Completed:
  • Added maintained execution/, validation/, kernel/, workspace/, run_artifacts/, and runners/ package scaffolding
  • Added focused OA07 regression coverage and validated the new slice plus legacy conductor_mvp kernel tests
  • Sourcery installed successfully via poetry install --with local, but the CLI currently hangs even for --help, so local Sourcery review remains blocked by Sourcery runtime behavior rather than repo config
  • Migration Rules:
  • Do not add substantive new feature work to conductor_mvp/
  • Do not add new forward-path implementation work to spike/
  • Treat conductor_mvp/ as a temporary migration-source package to be deleted after subsystem extraction
  • Treat codex_harness/ and spike/ as reference packages during OA07 migration
  • Initial Files in Scope:
  • src/tnh_scholar/agent_orchestration/conductor_mvp/
  • src/tnh_scholar/agent_orchestration/spike/
  • src/tnh_scholar/agent_orchestration/common/
  • tests/agent_orchestration/
  • docs/architecture/agent-orchestration/adr/adr-oa03-agent-runner-architecture.md
  • docs/architecture/agent-orchestration/adr/adr-oa03.1-claude-code-runner.md
  • docs/architecture/agent-orchestration/adr/adr-oa03.3-codex-cli-runner.md
  • docs/architecture/agent-orchestration/adr/adr-oa04-workflow-schema-opcode-semantics.md
  • docs/architecture/agent-orchestration/adr/adr-oa04.1-implementation-notes-mvp-buildout.md

๐Ÿšจ OA07.1 Bootstrap Worktree Slice

  • Status: IN PROGRESS โ€” PR-7 and PR-8 are merged on main; bootstrap-proof workflow slice is next
  • Priority: HIGHEST (prove real maintained bootstrap usefulness)
  • Context: The maintained OA04.x runtime contracts now include the real OA07.1 worktree runtime boundary and the maintained headless entry path. Bootstrap is no longer blocked on substrate. The next blocker is proving one useful repo-native workflow through the maintained path. Follow ADR-OA07 and ADR-OA07.1.
  • Bootstrap Goal:
  • create a managed git worktree from a committed base ref
  • run RUN_AGENT and RUN_VALIDATION against the worktree root
  • keep canonical run artifacts in the run directory
  • support ROLLBACK(pre_run) to recorded base state
  • establish the headless path needed for later commit/push/PR automation
  • Why This Is Next:
  • the worktree runtime boundary and maintained headless app-layer entry are now implemented on main
  • the system still needs one clean end-to-end proof that it can complete a useful repo task through the maintained path
  • OA05/OA06 depth work should follow a live bootstrap proof, not precede it
  • Recommended PR sizing:
  • Prefer 2 PRs to stay comfortably under diff-size guidance
  • A single PR is possible only if the implementation stays narrow and avoids CLI/app-layer work
  • PR Sequence:
  • PR-7 feat/oa07.1-worktree-workspace-service โ€” Worktree runtime boundary (medium)
    • replace NullWorkspaceService as the forward-path maintained implementation with a real git-backed workspace service
    • add typed workspace context models: repo_root, worktree_path, branch_name, base_ref, base_sha
    • implement managed branch + worktree creation from committed base ref
    • update the workspace protocol so pre-run setup returns structured workspace context and does not rely on the run directory as the workspace handle
    • pass the worktree root as working_directory to runner and validation services for mutable steps
    • implement ROLLBACK(pre_run) by discarding and recreating the managed worktree from recorded base_sha
    • persist workspace context into canonical run artifacts or run metadata extension
    • tests for worktree creation, mutable-step execution in the worktree root, recorded base state, and ROLLBACK(pre_run) semantics
    • keep NullWorkspaceService only for tests or explicit non-operational contexts
  • PR-8 feat/oa07-bootstrap-headless-entry โ€” Maintained headless bootstrap entry (small/medium, next)
    • load one workflow
    • create worktree context
    • execute workflow end to end
    • write canonical artifacts and final state
    • keep the initial entry local/headless; no GitHub automation required
  • Bootstrap Proof feat/oa07-bootstrap-proof-workflow โ€” Real repo-task bootstrap proof (small/medium, next)
    • add one maintained workflow definition for a narrow useful repository task
    • exercise the current maintained subset: RUN_AGENT, RUN_VALIDATION, STOP, with ROLLBACK(pre_run) available only as fallback
    • prove the run yields a reviewable repo diff plus canonical metadata, manifests, events, and final state
    • keep semantic-control depth and review automation out of scope unless they become true blockers
  • PR-9 feat/oa07-review-automation โ€” Commit/push/PR automation (optional, small/medium)
    • create local commits on the managed branch
    • push the work branch
    • open or update a PR
    • keep protected-branch merge human-only
  • Explicit deferrals for this slice:
  • commit/push/PR automation if it causes PR-7 or PR-8 to exceed preferred diff size
  • strict OA05 compile-validation as a blocker for bootstrap
  • full OA06 planner fixture/vector suite beyond the bootstrap path
  • maintained EVALUATE / GATE support before the first useful bootstrap proof
  • non-script harness backends
  • stacked PR orchestration
  • multi-agent mutable collaboration inside one worktree
  • pre_step rollback and named checkpoints
  • Files likely in scope:
  • src/tnh_scholar/agent_orchestration/workspace/
  • src/tnh_scholar/agent_orchestration/kernel/service.py
  • src/tnh_scholar/agent_orchestration/run_artifacts/
  • tests/agent_orchestration/test_oa07_execution_validation_kernel.py
  • docs/architecture/agent-orchestration/adr/adr-oa07-diff-policy-safety-rails.md
  • docs/architecture/agent-orchestration/adr/adr-oa07.1-worktree-lifecycle-and-rollback.md

โœ… OA04 Contract Family โ€” PR Sequence (Complete)

  • Status: COMPLETE โ€” contract ADRs implemented in maintained code; bootstrap remains blocked on OA07.1 worktree execution
  • Context: OA04.2โ€“OA04.5 are the contract-layer ADRs between the OA07 runtime foundations and the maintained runner/policy/provenance implementations. That contract family is now landed in code and should no longer be treated as pending. See implementation notes in ADR-OA04.1 Addendum 2026-03-27 for the original scaffolding gaps and ADR-OA04.1 Addendum 2026-04-05 for the bootstrap-first reprioritization.
  • Dependency chain:
  • OA04.3 (run dir + manifests + evaluator evidence seam) โ†’ OA04.2 (runners normalize into canonical evidence)
  • OA04.4 (policy taxonomy + requested/effective split) โ†’ OA04.2 (runner request carries typed requested policy)
  • OA04.5 (harness backend) โ†’ validation/ subsystem (extends empty package)
  • OA04.2 (runner adapters) โ†’ milestone: first real agent invocations
  • Implementation Notes (default choices for implementers):
  • Apply OS01 pragmatically: add structure where it protects a real boundary or likely evolution seam, not just to mirror the taxonomy mechanically.
  • Prefer moving maintained code toward the ADR contracts when the migration path is clean; do not preserve stub shapes just for short-term compatibility inside maintained packages.
  • Treat run_artifacts/ as the canonical evidence boundary. If a choice arises between storing data in runner-local files versus canonical artifact roles + manifests, choose canonical artifact roles + manifests.
  • Keep evaluator assembly strict: evaluators read metadata.json, events.ndjson, manifest.json, and canonical artifact roles only. Do not add evaluator dependencies on adapter-local raw capture filenames.
  • Keep manifests thin and stable. Put compact cross-step evidence in evidence_summary; put detailed per-step policy data in canonical policy_summary.json.
  • Keep persistence ownership in run_artifacts/. Runner adapters and validation backends should return typed normalized outputs and artifact payloads; they should not own final manifest writing policy.
  • Evolve existing maintained code where it already matches the target shape. In particular, refactor validation/service.py toward the script backend/resolver seam rather than replacing it wholesale.
  • Expand kernel/service.py by extraction, not accretion. If per-step provenance writing starts to crowd the kernel, extract focused collaborators rather than growing one large procedural service.
  • Use explicit mapper/normalizer classes whenever native CLI or harness output is translated into maintained models. Do not hide parsing, normalization, termination mapping, and persistence decisions in one adapter class.
  • Keep policy taxonomy aligned with OS01: init-time settings/config, per-step requested policy, execution-time effective policy, persisted PolicySummary. Avoid โ€œpolicy blobโ€ models that mix those concerns.
  • Do not add ceremony without benefit: avoid speculative service/factory layers, unnecessary mappers for nearly identical shapes, or package splits that do not improve testability, replaceability, or clarity.
  • Existing thin models in run_artifacts/, runners/, and validation/ are scaffolding, not target architecture. It is acceptable to break those internal shapes in favor of cleaner maintained contracts during this implementation sequence.
  • PR Sequence:
  • PR-1 feat/oa04-contract-adrs โ€” ADR acceptance (docs only)
    • Commit new OA04.2, OA04.3, OA04.4, OA04.5 files; later implementation has since moved those decimal ADRs to implemented
    • Carry in already-modified OA03.1/OA03.3 addendums + OA04 update + index.md
  • PR-2 feat/oa04.3-run-artifact-contract โ€” Run-artifact domain contract + store (medium)
    • Expand run_artifacts/models.py: RunMetadata, RunEventRecord, ArtifactRole enum, StepArtifactEntry, StepManifest
    • Add manifest-level evidence_summary with compact canonical evidence references
    • Add canonical policy_summary artifact role for detailed requested/effective policy records
    • Expand run_artifacts/protocols.py: write_step_manifest, artifact_step_dir, canonical artifact persistence APIs
    • Update run_artifacts/filesystem_store.py to implement both
    • Keep filesystem concerns behind the store; no evaluator-facing filename dependencies
    • Tests for manifest writing, event stream fields, and canonical artifact-role lookup
  • PR-3 feat/oa04.3-kernel-provenance-integration โ€” Kernel provenance integration (medium)
    • Update kernel/runtime services to write enriched run metadata, canonical events, and per-step manifests
    • Persist compact manifest summaries and canonical artifact references only; no adapter-local evidence lookup in evaluator assembly
    • Capture workspace diff/status and policy summary references through canonical artifact roles
    • Tests for manifest/event creation across RUN_AGENT, RUN_VALIDATION, EVALUATE, and GATE
    • Depends on PR-2
  • PR-4 feat/oa04.4-policy-contract โ€” Execution policy package (medium)
    • New agent_orchestration/execution_policy/ package
    • models.py: ExecutionPolicySettings, RequestedExecutionPolicy, EffectiveExecutionPolicy, PolicyViolationClass, PolicyViolation, PolicySummary
    • assembly.py: ExecutionPolicyAssembler for system settings โ†’ workflow โ†’ step requested policy โ†’ runtime override/effective policy derivation
    • protocols.py: ExecutionPolicyAssemblerProtocol
    • Update runners/models.py: retire PromptInteractionPolicy stub; link RunnerTaskRequest to RequestedExecutionPolicy
    • Persist detailed policy_summary.json via canonical policy_summary artifact role; keep only compact summary data in manifests
    • Tests for assembly precedence, requested/effective policy derivation, and hard-fail behavior
    • Can run in parallel with PR-3
  • PR-5 feat/oa04.2-runner-adapters โ€” Runner adapters (largest PR)
    • Expand runners/models.py: AdapterCapabilities (capability declaration per OA04.2 ยง3a)
    • Add explicit mapper/normalizer classes for native CLI output โ†’ maintained runner-domain models
    • Add runners/adapters/claude_cli.py: claude --print --output-format stream-json --permission-mode dontAsk, stream-json parsing, normalization, termination mapping
    • Add runners/adapters/codex_cli.py: codex exec --json --output-last-message, JSONL capture, normalization, termination mapping
    • Adapters return typed normalized artifact payloads; canonical persistence is owned by run_artifacts
    • Evaluators consume manifests and canonical artifact roles only, never runner-local raw capture files
    • Tests for both adapters (subprocess mocking, normalization, mapper behavior, termination paths)
    • Depends on PR-2, PR-3, and PR-4
  • PR-6 feat/oa04.5-harness-backend โ€” Script harness backend (medium)
    • Build out agent_orchestration/validation/: BackendFamily enum, HarnessBackendRequest, HarnessBackendResult, HarnessBackendProtocol
    • backends/script.py: migrate from conductor_mvp/providers/validation_runner.py; normalize to validation_report/validation_stdout/validation_stderr artifact roles
    • Add backend resolver seam, but defer cli and web implementation until a concrete maintained consumer exists
    • Tests for script backend, resolver seam, and artifact role normalization
    • Depends on PR-2; independent of PR-4 and PR-5

๐Ÿ”ฎ JVB VS Code Parallel Viewer (ADR-JVB02)

  • Status: NOT STARTED (Design Phase)
  • Priority: HIGH (flagship feature, builds on VS Code integration foundation)
  • Context: The JVB (Journal of Vietnamese Buddhism) parallel viewer enables scholars to view scanned historical journal pages alongside OCR text and English translations. v1 was a bespoke browser-based prototype; v2 will integrate into the tnh-scholar VS Code extension.
  • Project Paused: This work was on hold while VS Code integration and tnh-gen were developed. Now that the walking skeleton is complete, we can resume with fresh design.

Related Documentation:

  • v1 As-Built: ADR-JVB01 โ€” Browser-based prototype architecture
  • v2 Strategy (Draft): JVB Viewer V2 Strategy โ€” Pre-ADR strategy note (good foundations, needs formalization)
  • VS Code Platform Strategy: VS Code as UI Platform โ€” Overall UI-UX direction
  • VS Code Integration: ADR-VSC01 โ€” CLI-first extension strategy (implemented)

Proposed ADR Structure:

docs/architecture/jvb-viewer/adr/
โ”œโ”€โ”€ adr-jvb01_as-built_jvb_viewer_v1.md              # โœ… Exists
โ”œโ”€โ”€ adr-jvb02-vscode-parallel-viewer-strategy.md     # ๐Ÿ†• Main strategy ADR
โ”œโ”€โ”€ adr-jvb02.1-ui-ux-design.md                      # ๐Ÿ†• Mockups, pane layout, workflows
โ”œโ”€โ”€ adr-jvb02.2-data-model-api-contract.md           # ๐Ÿ†• JSON schema, extensionโ†”backend API
โ””โ”€โ”€ adr-jvb02.3-implementation-guide.md              # ๐Ÿ†• Phase-by-phase implementation

Key Design Decisions Needed:

  1. VS Code Pane Architecture: Which panes for scan overlay, text views, reconciliation controls, navigation?
  2. Webview vs Custom Editor: Custom editor for .jvb.json files or webview panel approach?
  3. Backend Integration: Python service via CLI (tnh-gen patterns) or dedicated HTTP service?
  4. Data Model: Refine per-page JSON schema from v2 strategy, define API contract
  5. Dual OCR Reconciliation UI: How users choose between Google OCR vs AI vision sources

Deliverables:

  • ADR-JVB02: Main strategy ADR (formalize v2 strategy, VS Code integration focus)
  • ADR-JVB02.1: UI-UX design with mockups/screen visualizations
  • ADR-JVB02.2: Data model and API contract specification
  • ADR-JVB02.3: Implementation guide with milestones
  • M0 Prototype: Static HTML mockup in VS Code webview (validate approach)

Implementation Milestones (from v2 strategy, to be refined):

  • M0: Static prototype โ€” HTML showing page image, word bboxes, selectable sentences
  • M1: VS Code extension โ€” load/save per-page JSON, overlay modes, section breadcrumb
  • M2: Dual-source UI โ€” GOCR vs AI diff chooser, batch adoption, "reviewed" status
  • M3: Structure cues โ€” columns, heading levels, emphasis flags captured and rendered
  • M4: Beta โ€” section-level navigation, export HTML, light theming

๐Ÿ”ฎ Add --prompt-dir Global Flag to tnh-gen

  • Status: NOT STARTED
  • Priority: HIGH (improves tnh-gen UX for one-off operations and testing)
  • Estimate: 1-2 hours
  • Context: Users need convenient way to override prompt catalog directory for one-off CLI calls without setting environment variables or creating temp config files
  • ADR: ADR-TG01 Addendum 2026-01-02
  • Why Important: Enables clean one-off operations (tnh-gen --prompt-dir ./test-prompts list) for testing, CI/CD, and development workflows
  • Current Workarounds:
  • Environment variable: TNH_PROMPT_DIR=/path tnh-gen list (awkward)
  • Temp config file: tnh-gen --config /tmp/config.yaml list (verbose)
  • Deliverables:
  • Add --prompt-dir flag to cli_callback() in src/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py:26
  • Update config_loader.py to handle prompt directory override at CLI precedence level
  • Update ConfigData type to accept prompt_catalog_dir override
  • Add unit tests for flag precedence (CLI flag > workspace > user > env)
  • Update help text and CLI reference documentation
  • Update docs/cli-reference/tnh-gen.md global flags section
  • Files to Modify:
  • src/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py (add flag)
  • src/tnh_scholar/cli_tools/tnh_gen/config_loader.py (precedence handling)
  • src/tnh_scholar/cli_tools/tnh_gen/types.py (type definitions)
  • tests/cli_tools/test_tnh_gen.py (unit tests)
  • docs/cli-reference/tnh-gen.md (documentation)
  • Testing: Verify --prompt-dir flag overrides all other config sources (workspace, user, env)

๐Ÿ”ฎ Full-Coverage yt-dlp Test Suite + Monthly Ops Trigger

  • Status: IN PROGRESS
  • Priority: HIGH (external dependency instability)
  • Goal: Add full coverage for all yt-dlp usage modules (transcript, audio, metadata, video download), then run a scheduled monthly ops test to surface breakage early.
  • Scope (Code):
  • src/tnh_scholar/video_processing/video_processing.py
  • src/tnh_scholar/cli_tools/ytt_fetch/ytt_fetch.py
  • src/tnh_scholar/cli_tools/audio_transcribe/audio_transcribe.py
  • src/tnh_scholar/cli_tools/audio_transcribe/version_check.py
  • Testing Strategy:
  • Add integration tests that exercise live yt-dlp behavior (guarded, opt-in)
  • Add unit tests for runtime env inspection + yt-dlp option injection
  • Add offline unit tests with recorded fixtures for metadata + transcript parsing
  • Add failure-mode tests (missing captions, private video, geo-blocked)
  • Monthly Ops Trigger:
  • Add cron-ready ops check script + validation URL list
  • Document monthly cron usage and log locations
  • Add failure notification workflow (issue creation or alerting)
  • Acceptance Criteria:
  • Coverage for all yt-dlp entry points + error paths
  • Monthly ops check runs without manual intervention (cron)
  • Clear failure report includes test URL, date, yt-dlp version

๐Ÿ”ฎ Patch ytt-fetch Robustness

  • Status: IN PROGRESS
  • Priority: HIGH (frequent breakage path)
  • Goal: Make ytt-fetch resilient to upstream changes and failures.
  • Test URL: https://youtu.be/iqNzfK4_meQ
  • Deliverables:
  • Add runtime preflight + yt-dlp runtime option injection
  • Verify transcript fetch on test URL (manual + test)
  • Add retries / improved error reporting
  • Ensure metadata embed + output path handling remain stable
  • Update docs and CLI reference if flags or behaviors change

๐Ÿšง GenAIService Core Components - Final Polish

  • Status: PRELIMINARY IMPLEMENTATION COMPLETE โœ… - Needs Polish & Registry Integration
  • Priority: MEDIUM (minor cleanup, not blocking)
  • What: Core GenAI service components (params_policy, model_router, safety_gate, completion_mapper) are implemented and working, need minor polish
  • Components Implemented:
  • params_policy.py โ€” Policy precedence implemented โœ…
    • โœ… Policy precedence: call hint โ†’ prompt metadata โ†’ defaults
    • โœ… Settings cached via @lru_cache (excellent optimization)
    • โœ… Strong typing with ResolvedParams Pydantic model
    • โœ… Routing diagnostics in routing_reason field
    • Score: 95/100 - Excellent implementation
  • model_router.py โ€” Capability-based routing implemented โœ…
    • โœ… Declarative routing table with _MODEL_CAPABILITIES
    • โœ… Structured output fallback (JSON mode capability switching)
    • โœ… Intent-aware architecture foundation
    • โš ๏ธ Intent routing currently placeholder (line 98-101)
    • Score: 92/100 - Strong implementation
  • safety_gate.py โ€” Three-layer safety checks implemented โœ…
    • โœ… Character limit, context window, budget estimation
    • โœ… Typed exceptions (SafetyBlocked)
    • โœ… Structured SafetyReport with actionable diagnostics
    • โœ… Content type handling (string/list with warnings)
    • โœ… Prompt metadata integration (safety_level)
    • โš ๏ธ Price constant hardcoded (line 30: _PRICE_PER_1K_TOKENS = 0.005)
    • โš ๏ธ Post-check currently stubbed
    • Score: 94/100 - Excellent implementation
  • completion_mapper.py โ€” Bi-directional mapping implemented โœ…

    • โœ… Clean transport โ†’ domain transformation
    • โœ… Error details surfaced in policy_applied
    • โœ… Status handling (OK/FAILED/INCOMPLETE)
    • โœ… Pure mapper functions (no side effects)
    • โš ๏ธ policy_applied uses Dict[str, object] (should be more specific)
    • Score: 91/100 - Strong implementation
  • High Priority (Before Merging):

  • Add Google-style docstrings to public functions (see style-guide.md)
    • apply_policy(), select_provider_and_model(), pre_check(), post_check(), provider_to_completion()
  • Move _PRICE_PER_1K_TOKENS constant to Settings or registry (blocks ADR-A14)
    • Moved to Settings.price_per_1k_tokens; safety gate now consumes setting.
  • Type tightening in completion_mapper

    • Added PolicyApplied alias (dict[str, str | int | float]).
  • Medium Priority (V1 Completion):

  • Promote policy_applied typing to a shared domain type (CompletionEnvelope) to avoid loose dict usage across the service.

  • Capability registry extraction (โ†’ ADR-A14)

    • Create runtime_assets/registries/providers/openai.jsonc
    • Implement RegistryLoader with JSONC support
    • Refactor model_router.py to use registry
    • Refactor safety_gate.py to use registry pricing
    • See: ADR-A14: File-Based Registry System
  • Intent routing implementation
  • Post-check safety implementation

  • Low Priority (Future Work):

  • Warning enum system
    • Create typed warning codes instead of strings
    • Affects: safety_gate, completion_mapper, model_router
  • Enhanced diagnostics
    • More granular routing reasons
    • Detailed safety check diagnostics
  • Message.content Type Architecture Investigation (design quality, non-blocking)
    • Location: gen_ai_service/models/domain.py:92-96
    • Issue: Sourcery identifies Union[str, List[ChatCompletionContentPartParam]] as source of complexity
    • Context: Current design intentionally supports OpenAI's flexible content API (plain text OR structured parts with images/etc)
    • Investigation Areas:
    • Document current usage patterns across codebase
    • Assess downstream complexity: where are type checks needed?
    • Evaluate normalization strategies (always list? separate fields? utility methods?)
    • Consider provider compatibility (Anthropic, etc)
    • Draft ADR or addendum to existing GenAI ADRs if design change warranted
    • Impact: Affects message representation throughout GenAIService

โธ๏ธ GenAIService Thread Safety and Rate Limiting (ADR-A15)

  • Status: DEFERRED - Not needed for VS Code integration (process isolation)
  • Priority: MEDIUM (revisit when building Python batch pipelines)
  • Issue: #22
  • ADR: ADR-A15: Thread Safety and Rate Limiting
  • Why Deferred: VS Code extension uses process isolation (each tnh-gen call = separate GenAIService instance). Thread safety only matters for Python-native batch pipelines.
  • When to Revisit: When implementing concurrent corpus processing loops or batch translation pipelines
  • Estimate: 3-6 hours (Phase 1: 1-2 hours, Phase 2: 2-4 hours)
  • Quick Summary: Add thread-safe retry state, locked cache, and optional rate limiting for high-throughput scenarios

Priority 2: Production Hardening (Post-Bootstrap)

Goal: Harden TNH Scholar for production use after VS Code integration enables AI-assisted development. Focuses on reliability, test coverage, and type safety.

๐Ÿšง OpenAI SDK 2.15.0 Validation (High Priority)

  • Status: NOT STARTED
  • Why: SDK bump impacts OpenAI adapter. (Codex harness suspended โ€” see ADR-OA03.2 addendum)
  • Tasks:
  • Revalidate OpenAI adapter request/response mappings against 2.15.0
  • Update compatibility notes/docs if schema drift is found

๐Ÿšง Audio-Transcribe Service-Layer Refactor (P2)

  • Status: NOT STARTED
  • Goal: Align audio-transcribe with object-service pattern and ytt-fetch robustness.
  • Tasks:
  • Introduce typed service orchestrator + protocols (CLI becomes thin wrapper)
  • Extract audio source resolution into a typed resolver (yt_url/CSV/local file)
  • Replace dict options with Pydantic models (transcription + diarization params)
  • Move logging bootstrap out of module import path so audio-transcribe modules are import-safe in tests and sandboxed environments
  • Add runtime preflight (yt-dlp inspector + ffmpeg availability); keep version checks ops-only
  • Migrate CLI to Typer with minimal surface (smoke tests only)
  • Add service-layer tests for all audio-transcribe use cases

โธ๏ธ Agent Orchestration - Codex Runner (ADR-OA03.2)

  • Status: TABLED (2026-01-25)
  • ADR: ADR-OA03.2
  • Why Tabled:
  • Scope: Spike revealed that a proper Codex harness requires implementing full client-side agent orchestration (the VS Code extension uses a proprietary app server, not raw API calls)
  • Cost-benefit: Current human-in-the-loop workflow with Claude Code + VS Code Codex extension is effective and cost-efficient for project needs
  • No compelling need: Investment not justified when manual workflow works well
  • Findings: Codex Harness Spike Findings
  • Preserved Artifacts: src/tnh_scholar/agent_orchestration/codex_harness/, src/tnh_scholar/cli_tools/tnh_codex_harness/
  • Conditions for Resumption: Further insight or clear business need that justifies full agent orchestration investment

๐Ÿšง Expand Test Coverage

  • Status: NOT STARTED
  • Current Coverage: ~5% (4 test modules)
  • Target: 50%+ for gen_ai_service
  • Tasks:
  • GenAI service flows: prompt rendering, policy resolution, provider adapters
  • CLI integration tests (option parsing, environment validation)
  • Configuration loading edge cases
  • Error handling scenarios
  • Pattern catalog validation
  • Full CLI test suite with 100% coverage (HIGH PRIORITY - include all CLI tools, not just tnh-gen)
  • tnh-gen CLI comprehensive coverage (HIGH PRIORITY - Missing basic command tests):
    • Add tests for all tnh-gen config commands (show, get, set, list)
    • Add tests for all tnh-gen list commands (simple, query)
    • Add tests for tnh-gen gen command with various options
    • Test Path serialization in config commands (regression test for model_dump)
    • Test config precedence: defaults โ†’ user โ†’ workspace โ†’ CLI flags
    • Test error handling for all commands
    • Integration tests for full workflows
    • Context: Basic command tnh-gen config show failed with Path serialization bug that should have been caught by tests

๐Ÿšง Consolidate Environment Loading

๐Ÿšง Configuration Tech Debt โ€” Migrate to ADR-CF01/CF02 Three-Layer Model

Migration Phases:

  1. Phase 1: Extend TNHContext for Prompts โœ… COMPLETE
  2. Add PromptPathBuilder analogous to RegistryPathBuilder โ€” src/tnh_scholar/configuration/context.py:165-191
  3. Define three-layer prompt discovery: workspace โ†’ user โ†’ built-in
  4. Create runtime_assets/prompts/ with minimal built-in set (3 prompts + _catalog.yaml)
  5. Unit tests for prompt path resolution โ€” tests/configuration/test_prompt_discovery.py

  6. Phase 2: Migrate GenAISettings โœ… COMPLETE

  7. Update GenAISettings.prompt_dir to use lazy TNHContext resolution โ€” settings.py:89-102
  8. Legacy TNH_DEFAULT_PROMPT_DIR constant removed from __init__.py
  9. tnh-gen config_loader works with new resolution

  10. Phase 3: Eliminate Module-Level Constants โœ… COMPLETE

  11. TNH_CONFIG_DIR, TNH_LOG_DIR, TNH_DEFAULT_PROMPT_DIR removed from __init__.py
  12. Only structural constants remain (TNH_ROOT_SRC_DIR, TNH_PROJECT_ROOT_DIR, TNH_CLI_TOOLS_DIR)
  13. No FileNotFoundError raises at import time for config paths

  14. Phase 4: Unify Subsystem Settings (Medium Priority) โ€” NOT STARTED

  15. Audit all BaseSettings classes across subsystems
  16. Deprecate PromptSystemSettings.tnh_prompt_dir in favor of unified approach
  17. Standardize env var prefixes (e.g., TNH_GENAI_*, TNH_AUDIO_*)

  18. Phase 5: Propagate tnh-gen Config Pattern (Low Priority) โ€” NOT STARTED

  19. Create shared CLIConfigLoader base for all CLI tools
  20. Add config show/get/set subcommands to major CLI tools
  21. Standardize workspace config file format

Success Criteria: - [x] No module-level config Path constants in __init__.py - [x] Prompt path discovery flows through TNHContext - [x] Prompt directories follow three-layer precedence (workspace โ†’ user โ†’ built-in) - [ ] At least tnh-gen and audio-transcribe share config loader pattern

๐Ÿšง Clean Up CLI Tool Versions

  • Status: PARTIAL (old versions removed, utilities pending)
  • Location: cli_tools/audio_transcribe/
  • Tasks:
  • Remove legacy audio_transcribe0.py
  • Remove audio_transcribe1.py
  • Remove audio_transcribe2.py
  • Keep only current version
  • Create shared utilities (argument parsing, environment validation, logging)

โœ… Documentation Reorganization (ADR-DD01 & ADR-DD02) โ€” See Archive

Phase 1 COMPLETE - Remaining Phase 2 tasks:

  • Doc metadata validation script (check_doc_metadata.py) - validate front matter
  • Docstring coverage (interrogate) - threshold on src/tnh_scholar
  • Archive index + legacy ADR migration to docs/archive/**
  • Backlog: populate docs/docs-ops/roadmap.md with missing topics
  • User guides for new features, architecture component diagrams

๐Ÿšง Type System Improvements

  • Status: PARTIAL
  • Current: 58 errors across 16 files
  • High Priority: Fix audio processing boundary types, core text processing types, function redefinitions
  • Medium Priority: Add missing type annotations, fix Pattern class type issues
  • Low Priority: Clean up Any return types, standardize type usage

๐Ÿšง Prompt Catalog Safety

  • Status: NOT STARTED
  • Priority: HIGH (critical infrastructure)
  • Problem: Adapter doesn't handle missing keys or invalid front-matter gracefully
  • Tasks:
  • Add manifest validation
  • Implement caching
  • Better error messages (unknown prompt, hash mismatch)
  • Front-matter validation
  • Document prompt schema

๐Ÿšง Knowledge Base Implementation

  • Status: DESIGN COMPLETE
  • ADR: ADR-K01
  • Tasks:
  • Implement Supabase integration
  • Vector search functionality
  • Query capabilities
  • Semantic similarity search

๐Ÿšง Configuration & Data Layout

  • Status: NOT STARTED
  • Priority: HIGH (blocks pip install)
  • Problem: src/tnh_scholar/__init__.py raises FileNotFoundError when repo layout missing
  • Tasks:
  • Package pattern assets as resources
  • Make patterns directory optional
  • Move directory checks to CLI entry points only
  • Ensure installed wheels work without patterns/ directory

๐Ÿšง Logging System Scope

  • Location: src/tnh_scholar/logging_config.py
  • Problem: Modules call setup_logging individually
  • Tasks:
  • Define single application bootstrap
  • Document logger acquisition pattern (get_logger only)
  • Create shared CLI bootstrap helper

๐Ÿšง Comprehensive CLI Reference Documentation

  • Status: IN PROGRESS (tnh-gen complete โœ…, other CLIs pending)
  • Tasks:
  • Update user-guide examples to use tnh-gen
  • Document other CLI tools (audio-transcribe, ytt-fetch, nfmt, etc.)
  • Consider automation for CLI reference generation

๐Ÿ”ฎ Shared CLI UI Module (tnh_cli_ui)

  • Status: NOT STARTED (Research/Exploration)
  • Priority: MEDIUM (UX consistency across CLI tools)
  • ADR: ADR-ST01.1: tnh-setup UI Design
  • Context: The tnh-setup UI redesign (Rich library) could be extracted into a shared module for consistent styling across all tnh-scholar CLI tools.
  • Research Questions:
  • Survey CLI tools for shared UI patterns (headers, status indicators, progress, tables)
  • Evaluate Rich vs alternatives (click-extra, questionary, etc.)
  • Design minimal API surface for common operations
  • Consider Typer + Rich integration patterns
  • Potential Scope:
  • Styled section headers with step progress
  • Standardized status indicators (โœ“/โš /โœ—/โ—‹/โ€ข) with color vocabulary
  • Spinner wrappers for async operations
  • Summary table generators
  • Banner/header utilities
  • Affected Tools: tnh-setup, tnh-gen, ytt-fetch, audio-transcribe, nfmt, token-count, tnh-tree

๐Ÿšง Document Success Cases

  • Status: NOT STARTED
  • Goal: Document TNH Scholar's successful real-world applications
  • Cases: Deer Park Cooking Course (SRTs), 1950s JVB Translation (OCR), Dharma Talk Transcriptions, Sr. Dang Nhiem's talks
  • Tasks:
  • Create docs/case-studies/ directory structure
  • Document each case with context, tools, challenges, outcomes

๐Ÿšง Notebook System Overhaul

  • Status: NOT STARTED
  • Priority: HIGH
  • Goal: Transform notebooks from exploratory/testing to production-quality examples
  • Tasks:
  • Audit & categorize all notebooks
  • Polish core example notebooks
  • Convert testing notebooks to pytest
  • Archive legacy notebooks with context notes

Priority 3: Future Work & Advanced Features

Goal: Long-term sustainability, advanced features, and nice-to-have improvements. Address after bootstrap loop is working.

๐Ÿšง Refactor Monolithic Modules

๐Ÿšง Complete Provider Abstraction

  • Status: NOT STARTED
  • Tasks:
  • Implement Anthropic adapter
  • Add provider-specific error handling
  • Test fallback/retry across providers
  • Provider capability discovery
  • Multi-provider cost optimization

๐Ÿšง Developer Experience Improvements

  • Status: PARTIAL (hooks and Makefile exist, automation pending)
  • Tasks:
  • Add pre-commit hooks (Ruff, notebook prep)
  • Create Makefile for common tasks (lint, test, docs, format, setup)
  • Add MyPy to pre-commit hooks
  • Add contribution templates (issue/PR templates)
  • CONTRIBUTING.md exists and documented
  • Release automation
  • Changelog automation

๐Ÿšง Historical ADR Status Audit

  • Status: NOT STARTED
  • Context: 25 ADRs marked with status: current from pre-markdown-standards migration
  • Tasks:
  • Review each ADR to determine actual status (implemented/superseded/rejected)
  • Update status field in YAML frontmatter
  • Cross-reference with newer ADRs for superseded decisions

๐Ÿšง Package API Definition

  • Status: Deferred during prototyping
  • Tasks:
  • Review and document all intended public exports
  • Implement __all__ in key __init__.py files
  • Verify exports match documentation

๐Ÿšง Repo Hygiene

  • Problem: Generated artifacts in repo (build/, dist/, site/, *.txt)
  • Tasks:
  • Add to .gitignore
  • Document regeneration process
  • Rely on release pipelines for builds

๐Ÿšง Notebook & Research Management

  • Location: notebooks/, docs/research/
  • Problem: Valuable but not curated exploratory work
  • Tasks:
  • Adopt naming/linting convention
  • Publish vetted analyses to docs/research via nbconvert
  • Archive obsolete notebooks

Recently Completed Tasks (Archive)

tnh-gen CLI Implementation โœ…

  • Completed: 2025-12-27
  • ADR: ADR-TG01, ADR-TG01.1
  • What: Protocol-driven CLI replacing tnh-fab, dual modes (human-friendly default, --api for machine consumption)
  • Documentation: tnh-gen CLI Reference (661 lines)

File-Based Registry System (ADR-A14) โœ…

  • Completed: 2026-01-01 (PR #24)
  • ADR: ADR-A14, ADR-A14.1
  • What: JSONC-based registry with multi-tier pricing, TNHContext path resolution, staleness detection
  • Key Deliverables: openai.jsonc registry, RegistryLoader, Pydantic schemas, JSON Schema for VS Code, refactored model_router.py and safety_gate.py, 264 tests passing

VS Code Extension Walking Skeleton โœ…

  • Completed: 2026-01-07
  • ADR: ADR-VSC01, ADR-VSC02
  • What: TypeScript extension enabling "Run Prompt on Active File" workflow
  • Capabilities: QuickPick prompt selector, dynamic variable input, tnh-gen run subprocess execution, split-pane output, unit/integration tests
  • Validation: Proves bootstrapping concept - extension ready to accelerate TNH Scholar development

Patternโ†’Prompt Migration โœ…

  • Completed: 2026-01-19
  • ADR: ADR-PT04
  • What: Patternโ†’Prompt terminology migration and directory restructuring
  • Key Changes: patterns/ โ†’ prompts/ (standalone tnh-prompts repo), TNH_PATTERN_DIR โ†’ TNH_PROMPT_DIR, removed legacy tnh-fab CLI
  • Breaking: TNH_PATTERN_DIR env var removed, tnh-fab CLI removed

Provenance Format Refactor โœ…

  • Completed: 2026-01-19
  • ADR: ADR-TG01 Addendum 2025-12-28
  • What: Switched tnh-gen from HTML comments to YAML frontmatter for provenance metadata
  • Files Modified: provenance.py, test_tnh_gen.py, tnh-gen.md

OpenAI Client Unification โœ…

  • Completed: 2025-12-10
  • ADR: ADR-A13
  • What: Migrated from legacy openai_interface/ to modern gen_ai_service/providers/ architecture (6 phases)

Core Stubs Implementation โœ…

  • Completed: 2025-12-10
  • What: Implemented params_policy, model_router, safety_gate, completion_mapper with strong typing
  • Grade: A- (92/100) - Production ready with minor polish

Documentation Reorganization Phase 1 โœ…

  • Completed: 2025-12-05
  • ADR: ADR-DD01, ADR-DD02
  • What: Absolute links, MkDocs strict mode, filesystem-driven nav, lychee link checking

Packaging & CI Infrastructure โœ…

  • Completed: 2025-11-20
  • What: pytest in CI, runtime dependencies declared, pre-commit hooks, Makefile targets

Remove Library sys.exit() Calls โœ…

  • Completed: 2025-11-15
  • What: Library code raises ConfigurationError instead of exiting process
  • Completed: 2025-12-05 (PR #14)
  • What: Converted 964 links to absolute paths, enabled MkDocs strict link validation, integrated link verification

NumberedText Section Boundary Validation โœ…

  • Completed: 2025-12-12
  • ADR: ADR-AT03.2 (status: accepted โ†’ should be implemented)
  • What: Implemented validate_section_boundaries() and get_coverage_report() methods for robust section management
  • Commits: cf99375 (docs), 798a552 (refactor unused methods)

TextObject Robustness Improvements โœ…

  • Completed: 2025-12-14
  • ADR: ADR-AT03.3 (status: accepted โ†’ should be implemented)
  • What: Implemented merge_metadata() with MergeStrategy enum, validate_sections() with fail-fast, converted to Pydantic v2, added structured exception hierarchy
  • Commits: 096e528 (implementation), 03654fe (../../docstrings)