TNH Scholar TODO List¶
Roadmap tracking the highest-priority TNH Scholar tasks and release blockers.
Last Updated: 2026-04-10 (PR-8 merged; bootstrap-proof slice next) Version: 0.3.1 (Alpha) Status: Active Development - Bootstrap path complete, production hardening phase
Style Note: Tasks use descriptive headers (not numbered items) to avoid renumbering churn when reorganizing. Use
####(h4) for task headers within priority sections.
Progress Summary¶
Bootstrap Path Status: โ COMPLETE โ VS Code integration working, AI-assisted development enabled.
Next Steps:
- ๐ฎ JVB VS Code Parallel Viewer (P1, design phase) โ ADR-JVB02 strategy + UI-UX design
- ๐ฎ Finish yt-dlp reliability suite + monthly ops trigger (P1, reliability)
- ๐ฎ Finish ytt-fetch robustness hardening (P1, reliability)
- ๐ฎ Add
--prompt-dirGlobal Flag to tnh-gen (P1, minor) - ๐ง GenAIService Final Polish - promote
policy_appliedtyping (P1, minor) - ๐ง Prompt Catalog Safety - error handling, validation (P2, critical infrastructure)
- ๐ง Knowledge Base Implementation (P2, design complete)
- ๐ง Expand Test Coverage to 50%+ (P2)
For completed items: See Archive section at end.
Priority Roadmap¶
This section organizes work into three priority levels based on criticality for production readiness.
Priority 1: VS Code Integration Enablement (Bootstrap Path)¶
Goal: Enable AI-assisted development of TNH Scholar itself via VS Code extension. Prioritizes foundational work for tnh-gen + extension integration.
Status: Foundation Complete (tnh-gen CLI โ , Registry System โ )
โ tnh-gen CLI Implementation โ See Archive¶
โ File-Based Registry System (ADR-A14) โ See Archive¶
โ VS Code Extension Walking Skeleton โ See Archive¶
โ PatternโPrompt Migration โ See Archive¶
โ Provenance Format Refactor (YAML Frontmatter) โ See Archive¶
๐จ Agent-Orch OA07 Runtime Implementation Sequence¶
- Status: IN PROGRESS - maintained execution/validation/kernel slice landed and tested
- Priority: HIGH (foundation work for durable MVP)
- Context: The accepted OA07 ADR set defines the maintained runtime architecture. The current
conductor_mvp/andspike/code remains useful as migration source/reference, but should not receive forward-path feature growth. - Why This Matters:
- current implementation readiness is medium, but in-place extension readiness is low
- the highest-risk boundary is still subprocess execution and typed validation/runner contracts
- coding should proceed by subsystem extraction, not by continuing prototype package growth
- Implementation Order:
- Build
agent_orchestration/execution/- typed invocation families
- cwd/env/timeout policy
- termination/result taxonomy
- final argv rendering boundary
- Build
agent_orchestration/validation/on top ofexecution/- preserve OA04 external YAML compatibility by normalizing source shapes into typed internal models
- migrate behavior out of
conductor_mvp/providers/validation_runner.py
- Extract
agent_orchestration/kernel/WorkflowCatalogWorkflowValidatorKernelStateKernelRunService
- Introduce
agent_orchestration/workspace/andagent_orchestration/run_artifacts/- move rollback/state capture and durable run record ownership out of prototype packages
- Migrate maintained runner behavior into
agent_orchestration/runners/- use
reference/spike/only as reference material - no new forward-path runner work in spike code
- use
- Current Slice Completed:
- Added maintained
execution/,validation/,kernel/,workspace/,run_artifacts/, andrunners/package scaffolding - Added focused OA07 regression coverage and validated the new slice plus legacy
conductor_mvpkernel tests - Sourcery installed successfully via
poetry install --with local, but the CLI currently hangs even for--help, so local Sourcery review remains blocked by Sourcery runtime behavior rather than repo config - Migration Rules:
- Do not add substantive new feature work to
conductor_mvp/ - Do not add new forward-path implementation work to
spike/ - Treat
conductor_mvp/as a temporary migration-source package to be deleted after subsystem extraction - Treat
codex_harness/andspike/as reference packages during OA07 migration - Initial Files in Scope:
src/tnh_scholar/agent_orchestration/conductor_mvp/src/tnh_scholar/agent_orchestration/spike/src/tnh_scholar/agent_orchestration/common/tests/agent_orchestration/docs/architecture/agent-orchestration/adr/adr-oa03-agent-runner-architecture.mddocs/architecture/agent-orchestration/adr/adr-oa03.1-claude-code-runner.mddocs/architecture/agent-orchestration/adr/adr-oa03.3-codex-cli-runner.mddocs/architecture/agent-orchestration/adr/adr-oa04-workflow-schema-opcode-semantics.mddocs/architecture/agent-orchestration/adr/adr-oa04.1-implementation-notes-mvp-buildout.md
๐จ OA07.1 Bootstrap Worktree Slice¶
- Status: IN PROGRESS โ PR-7 and PR-8 are merged on
main; bootstrap-proof workflow slice is next - Priority: HIGHEST (prove real maintained bootstrap usefulness)
- Context: The maintained OA04.x runtime contracts now include the real OA07.1 worktree runtime boundary and the maintained headless entry path. Bootstrap is no longer blocked on substrate. The next blocker is proving one useful repo-native workflow through the maintained path. Follow ADR-OA07 and ADR-OA07.1.
- Bootstrap Goal:
- create a managed git worktree from a committed base ref
- run
RUN_AGENTandRUN_VALIDATIONagainst the worktree root - keep canonical run artifacts in the run directory
- support
ROLLBACK(pre_run)to recorded base state - establish the headless path needed for later commit/push/PR automation
- Why This Is Next:
- the worktree runtime boundary and maintained headless app-layer entry are now implemented on
main - the system still needs one clean end-to-end proof that it can complete a useful repo task through the maintained path
- OA05/OA06 depth work should follow a live bootstrap proof, not precede it
- Recommended PR sizing:
- Prefer 2 PRs to stay comfortably under diff-size guidance
- A single PR is possible only if the implementation stays narrow and avoids CLI/app-layer work
- PR Sequence:
- PR-7
feat/oa07.1-worktree-workspace-serviceโ Worktree runtime boundary (medium)- replace
NullWorkspaceServiceas the forward-path maintained implementation with a real git-backed workspace service - add typed workspace context models:
repo_root,worktree_path,branch_name,base_ref,base_sha - implement managed branch + worktree creation from committed base ref
- update the workspace protocol so pre-run setup returns structured workspace context and does not rely on the run directory as the workspace handle
- pass the worktree root as
working_directoryto runner and validation services for mutable steps - implement
ROLLBACK(pre_run)by discarding and recreating the managed worktree from recordedbase_sha - persist workspace context into canonical run artifacts or run metadata extension
- tests for worktree creation, mutable-step execution in the worktree root, recorded base state, and
ROLLBACK(pre_run)semantics - keep
NullWorkspaceServiceonly for tests or explicit non-operational contexts
- replace
- PR-8
feat/oa07-bootstrap-headless-entryโ Maintained headless bootstrap entry (small/medium, next)- load one workflow
- create worktree context
- execute workflow end to end
- write canonical artifacts and final state
- keep the initial entry local/headless; no GitHub automation required
- Bootstrap Proof
feat/oa07-bootstrap-proof-workflowโ Real repo-task bootstrap proof (small/medium, next)- add one maintained workflow definition for a narrow useful repository task
- exercise the current maintained subset:
RUN_AGENT,RUN_VALIDATION,STOP, withROLLBACK(pre_run)available only as fallback - prove the run yields a reviewable repo diff plus canonical metadata, manifests, events, and final state
- keep semantic-control depth and review automation out of scope unless they become true blockers
- PR-9
feat/oa07-review-automationโ Commit/push/PR automation (optional, small/medium)- create local commits on the managed branch
- push the work branch
- open or update a PR
- keep protected-branch merge human-only
- Explicit deferrals for this slice:
- commit/push/PR automation if it causes PR-7 or PR-8 to exceed preferred diff size
- strict OA05 compile-validation as a blocker for bootstrap
- full OA06 planner fixture/vector suite beyond the bootstrap path
- maintained
EVALUATE/GATEsupport before the first useful bootstrap proof - non-script harness backends
- stacked PR orchestration
- multi-agent mutable collaboration inside one worktree
-
pre_steprollback and named checkpoints - Files likely in scope:
src/tnh_scholar/agent_orchestration/workspace/src/tnh_scholar/agent_orchestration/kernel/service.pysrc/tnh_scholar/agent_orchestration/run_artifacts/tests/agent_orchestration/test_oa07_execution_validation_kernel.pydocs/architecture/agent-orchestration/adr/adr-oa07-diff-policy-safety-rails.mddocs/architecture/agent-orchestration/adr/adr-oa07.1-worktree-lifecycle-and-rollback.md
โ OA04 Contract Family โ PR Sequence (Complete)¶
- Status: COMPLETE โ contract ADRs implemented in maintained code; bootstrap remains blocked on OA07.1 worktree execution
- Context: OA04.2โOA04.5 are the contract-layer ADRs between the OA07 runtime foundations and the maintained runner/policy/provenance implementations. That contract family is now landed in code and should no longer be treated as pending. See implementation notes in ADR-OA04.1 Addendum 2026-03-27 for the original scaffolding gaps and ADR-OA04.1 Addendum 2026-04-05 for the bootstrap-first reprioritization.
- Dependency chain:
- OA04.3 (run dir + manifests + evaluator evidence seam) โ OA04.2 (runners normalize into canonical evidence)
- OA04.4 (policy taxonomy + requested/effective split) โ OA04.2 (runner request carries typed requested policy)
- OA04.5 (harness backend) โ
validation/subsystem (extends empty package) - OA04.2 (runner adapters) โ milestone: first real agent invocations
- Implementation Notes (default choices for implementers):
- Apply OS01 pragmatically: add structure where it protects a real boundary or likely evolution seam, not just to mirror the taxonomy mechanically.
- Prefer moving maintained code toward the ADR contracts when the migration path is clean; do not preserve stub shapes just for short-term compatibility inside maintained packages.
- Treat
run_artifacts/as the canonical evidence boundary. If a choice arises between storing data in runner-local files versus canonical artifact roles + manifests, choose canonical artifact roles + manifests. - Keep evaluator assembly strict: evaluators read
metadata.json,events.ndjson,manifest.json, and canonical artifact roles only. Do not add evaluator dependencies on adapter-local raw capture filenames. - Keep manifests thin and stable. Put compact cross-step evidence in
evidence_summary; put detailed per-step policy data in canonicalpolicy_summary.json. - Keep persistence ownership in
run_artifacts/. Runner adapters and validation backends should return typed normalized outputs and artifact payloads; they should not own final manifest writing policy. - Evolve existing maintained code where it already matches the target shape. In particular, refactor
validation/service.pytoward the script backend/resolver seam rather than replacing it wholesale. - Expand
kernel/service.pyby extraction, not accretion. If per-step provenance writing starts to crowd the kernel, extract focused collaborators rather than growing one large procedural service. - Use explicit mapper/normalizer classes whenever native CLI or harness output is translated into maintained models. Do not hide parsing, normalization, termination mapping, and persistence decisions in one adapter class.
- Keep policy taxonomy aligned with OS01: init-time settings/config, per-step requested policy, execution-time effective policy, persisted
PolicySummary. Avoid โpolicy blobโ models that mix those concerns. - Do not add ceremony without benefit: avoid speculative service/factory layers, unnecessary mappers for nearly identical shapes, or package splits that do not improve testability, replaceability, or clarity.
- Existing thin models in
run_artifacts/,runners/, andvalidation/are scaffolding, not target architecture. It is acceptable to break those internal shapes in favor of cleaner maintained contracts during this implementation sequence. - PR Sequence:
- PR-1
feat/oa04-contract-adrsโ ADR acceptance (docs only)- Commit new OA04.2, OA04.3, OA04.4, OA04.5 files; later implementation has since moved those decimal ADRs to
implemented - Carry in already-modified OA03.1/OA03.3 addendums + OA04 update + index.md
- Commit new OA04.2, OA04.3, OA04.4, OA04.5 files; later implementation has since moved those decimal ADRs to
- PR-2
feat/oa04.3-run-artifact-contractโ Run-artifact domain contract + store (medium)- Expand
run_artifacts/models.py:RunMetadata,RunEventRecord,ArtifactRoleenum,StepArtifactEntry,StepManifest - Add manifest-level
evidence_summarywith compact canonical evidence references - Add canonical
policy_summaryartifact role for detailed requested/effective policy records - Expand
run_artifacts/protocols.py:write_step_manifest,artifact_step_dir, canonical artifact persistence APIs - Update
run_artifacts/filesystem_store.pyto implement both - Keep filesystem concerns behind the store; no evaluator-facing filename dependencies
- Tests for manifest writing, event stream fields, and canonical artifact-role lookup
- Expand
- PR-3
feat/oa04.3-kernel-provenance-integrationโ Kernel provenance integration (medium)- Update kernel/runtime services to write enriched run metadata, canonical events, and per-step manifests
- Persist compact manifest summaries and canonical artifact references only; no adapter-local evidence lookup in evaluator assembly
- Capture workspace diff/status and policy summary references through canonical artifact roles
- Tests for manifest/event creation across
RUN_AGENT,RUN_VALIDATION,EVALUATE, andGATE - Depends on PR-2
- PR-4
feat/oa04.4-policy-contractโ Execution policy package (medium)- New
agent_orchestration/execution_policy/package models.py:ExecutionPolicySettings,RequestedExecutionPolicy,EffectiveExecutionPolicy,PolicyViolationClass,PolicyViolation,PolicySummaryassembly.py:ExecutionPolicyAssemblerfor system settings โ workflow โ step requested policy โ runtime override/effective policy derivationprotocols.py:ExecutionPolicyAssemblerProtocol- Update
runners/models.py: retirePromptInteractionPolicystub; linkRunnerTaskRequesttoRequestedExecutionPolicy - Persist detailed
policy_summary.jsonvia canonicalpolicy_summaryartifact role; keep only compact summary data in manifests - Tests for assembly precedence, requested/effective policy derivation, and hard-fail behavior
- Can run in parallel with PR-3
- New
- PR-5
feat/oa04.2-runner-adaptersโ Runner adapters (largest PR)- Expand
runners/models.py:AdapterCapabilities(capability declaration per OA04.2 ยง3a) - Add explicit mapper/normalizer classes for native CLI output โ maintained runner-domain models
- Add
runners/adapters/claude_cli.py:claude --print --output-format stream-json --permission-mode dontAsk, stream-json parsing, normalization, termination mapping - Add
runners/adapters/codex_cli.py:codex exec --json --output-last-message, JSONL capture, normalization, termination mapping - Adapters return typed normalized artifact payloads; canonical persistence is owned by
run_artifacts - Evaluators consume manifests and canonical artifact roles only, never runner-local raw capture files
- Tests for both adapters (subprocess mocking, normalization, mapper behavior, termination paths)
- Depends on PR-2, PR-3, and PR-4
- Expand
- PR-6
feat/oa04.5-harness-backendโ Script harness backend (medium)- Build out
agent_orchestration/validation/:BackendFamilyenum,HarnessBackendRequest,HarnessBackendResult,HarnessBackendProtocol backends/script.py: migrate fromconductor_mvp/providers/validation_runner.py; normalize tovalidation_report/validation_stdout/validation_stderrartifact roles- Add backend resolver seam, but defer
cliandwebimplementation until a concrete maintained consumer exists - Tests for script backend, resolver seam, and artifact role normalization
- Depends on PR-2; independent of PR-4 and PR-5
- Build out
๐ฎ JVB VS Code Parallel Viewer (ADR-JVB02)¶
- Status: NOT STARTED (Design Phase)
- Priority: HIGH (flagship feature, builds on VS Code integration foundation)
- Context: The JVB (Journal of Vietnamese Buddhism) parallel viewer enables scholars to view scanned historical journal pages alongside OCR text and English translations. v1 was a bespoke browser-based prototype; v2 will integrate into the tnh-scholar VS Code extension.
- Project Paused: This work was on hold while VS Code integration and tnh-gen were developed. Now that the walking skeleton is complete, we can resume with fresh design.
Related Documentation:
- v1 As-Built: ADR-JVB01 โ Browser-based prototype architecture
- v2 Strategy (Draft): JVB Viewer V2 Strategy โ Pre-ADR strategy note (good foundations, needs formalization)
- VS Code Platform Strategy: VS Code as UI Platform โ Overall UI-UX direction
- VS Code Integration: ADR-VSC01 โ CLI-first extension strategy (implemented)
Proposed ADR Structure:
docs/architecture/jvb-viewer/adr/
โโโ adr-jvb01_as-built_jvb_viewer_v1.md # โ
Exists
โโโ adr-jvb02-vscode-parallel-viewer-strategy.md # ๐ Main strategy ADR
โโโ adr-jvb02.1-ui-ux-design.md # ๐ Mockups, pane layout, workflows
โโโ adr-jvb02.2-data-model-api-contract.md # ๐ JSON schema, extensionโbackend API
โโโ adr-jvb02.3-implementation-guide.md # ๐ Phase-by-phase implementation
Key Design Decisions Needed:
- VS Code Pane Architecture: Which panes for scan overlay, text views, reconciliation controls, navigation?
- Webview vs Custom Editor: Custom editor for
.jvb.jsonfiles or webview panel approach? - Backend Integration: Python service via CLI (
tnh-genpatterns) or dedicated HTTP service? - Data Model: Refine per-page JSON schema from v2 strategy, define API contract
- Dual OCR Reconciliation UI: How users choose between Google OCR vs AI vision sources
Deliverables:
- ADR-JVB02: Main strategy ADR (formalize v2 strategy, VS Code integration focus)
- ADR-JVB02.1: UI-UX design with mockups/screen visualizations
- ADR-JVB02.2: Data model and API contract specification
- ADR-JVB02.3: Implementation guide with milestones
- M0 Prototype: Static HTML mockup in VS Code webview (validate approach)
Implementation Milestones (from v2 strategy, to be refined):
- M0: Static prototype โ HTML showing page image, word bboxes, selectable sentences
- M1: VS Code extension โ load/save per-page JSON, overlay modes, section breadcrumb
- M2: Dual-source UI โ GOCR vs AI diff chooser, batch adoption, "reviewed" status
- M3: Structure cues โ columns, heading levels, emphasis flags captured and rendered
- M4: Beta โ section-level navigation, export HTML, light theming
๐ฎ Add --prompt-dir Global Flag to tnh-gen¶
- Status: NOT STARTED
- Priority: HIGH (improves tnh-gen UX for one-off operations and testing)
- Estimate: 1-2 hours
- Context: Users need convenient way to override prompt catalog directory for one-off CLI calls without setting environment variables or creating temp config files
- ADR: ADR-TG01 Addendum 2026-01-02
- Why Important: Enables clean one-off operations (
tnh-gen --prompt-dir ./test-prompts list) for testing, CI/CD, and development workflows - Current Workarounds:
- Environment variable:
TNH_PROMPT_DIR=/path tnh-gen list(awkward) - Temp config file:
tnh-gen --config /tmp/config.yaml list(verbose) - Deliverables:
- Add
--prompt-dirflag tocli_callback()insrc/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py:26 - Update
config_loader.pyto handle prompt directory override at CLI precedence level - Update
ConfigDatatype to acceptprompt_catalog_diroverride - Add unit tests for flag precedence (CLI flag > workspace > user > env)
- Update help text and CLI reference documentation
- Update
docs/cli-reference/tnh-gen.mdglobal flags section - Files to Modify:
src/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py(add flag)src/tnh_scholar/cli_tools/tnh_gen/config_loader.py(precedence handling)src/tnh_scholar/cli_tools/tnh_gen/types.py(type definitions)tests/cli_tools/test_tnh_gen.py(unit tests)docs/cli-reference/tnh-gen.md(documentation)- Testing: Verify
--prompt-dirflag overrides all other config sources (workspace, user, env)
๐ฎ Full-Coverage yt-dlp Test Suite + Monthly Ops Trigger¶
- Status: IN PROGRESS
- Priority: HIGH (external dependency instability)
- Goal: Add full coverage for all yt-dlp usage modules (transcript, audio, metadata, video download), then run a scheduled monthly ops test to surface breakage early.
- Scope (Code):
src/tnh_scholar/video_processing/video_processing.pysrc/tnh_scholar/cli_tools/ytt_fetch/ytt_fetch.pysrc/tnh_scholar/cli_tools/audio_transcribe/audio_transcribe.pysrc/tnh_scholar/cli_tools/audio_transcribe/version_check.py- Testing Strategy:
- Add integration tests that exercise live yt-dlp behavior (guarded, opt-in)
- Add unit tests for runtime env inspection + yt-dlp option injection
- Add offline unit tests with recorded fixtures for metadata + transcript parsing
- Add failure-mode tests (missing captions, private video, geo-blocked)
- Monthly Ops Trigger:
- Add cron-ready ops check script + validation URL list
- Document monthly cron usage and log locations
- Add failure notification workflow (issue creation or alerting)
- Acceptance Criteria:
- Coverage for all yt-dlp entry points + error paths
- Monthly ops check runs without manual intervention (cron)
- Clear failure report includes test URL, date, yt-dlp version
๐ฎ Patch ytt-fetch Robustness¶
- Status: IN PROGRESS
- Priority: HIGH (frequent breakage path)
- Goal: Make ytt-fetch resilient to upstream changes and failures.
- Test URL:
https://youtu.be/iqNzfK4_meQ - Deliverables:
- Add runtime preflight + yt-dlp runtime option injection
- Verify transcript fetch on test URL (manual + test)
- Add retries / improved error reporting
- Ensure metadata embed + output path handling remain stable
- Update docs and CLI reference if flags or behaviors change
๐ง GenAIService Core Components - Final Polish¶
- Status: PRELIMINARY IMPLEMENTATION COMPLETE โ - Needs Polish & Registry Integration
- Priority: MEDIUM (minor cleanup, not blocking)
- What: Core GenAI service components (params_policy, model_router, safety_gate, completion_mapper) are implemented and working, need minor polish
- Components Implemented:
- params_policy.py โ Policy precedence implemented โ
- โ Policy precedence: call hint โ prompt metadata โ defaults
- โ
Settings cached via
@lru_cache(excellent optimization) - โ
Strong typing with
ResolvedParamsPydantic model - โ
Routing diagnostics in
routing_reasonfield - Score: 95/100 - Excellent implementation
- model_router.py โ Capability-based routing implemented โ
- โ
Declarative routing table with
_MODEL_CAPABILITIES - โ Structured output fallback (JSON mode capability switching)
- โ Intent-aware architecture foundation
- โ ๏ธ Intent routing currently placeholder (line 98-101)
- Score: 92/100 - Strong implementation
- โ
Declarative routing table with
- safety_gate.py โ Three-layer safety checks implemented โ
- โ Character limit, context window, budget estimation
- โ
Typed exceptions (
SafetyBlocked) - โ
Structured
SafetyReportwith actionable diagnostics - โ Content type handling (string/list with warnings)
- โ
Prompt metadata integration (
safety_level) - โ ๏ธ Price constant hardcoded (line 30:
_PRICE_PER_1K_TOKENS = 0.005) - โ ๏ธ Post-check currently stubbed
- Score: 94/100 - Excellent implementation
-
completion_mapper.py โ Bi-directional mapping implemented โ
- โ Clean transport โ domain transformation
- โ
Error details surfaced in
policy_applied - โ Status handling (OK/FAILED/INCOMPLETE)
- โ Pure mapper functions (no side effects)
- โ ๏ธ
policy_appliedusesDict[str, object](should be more specific) - Score: 91/100 - Strong implementation
-
High Priority (Before Merging):
- Add Google-style docstrings to public functions (see style-guide.md)
apply_policy(),select_provider_and_model(),pre_check(),post_check(),provider_to_completion()
- Move
_PRICE_PER_1K_TOKENSconstant to Settings or registry (blocks ADR-A14)- Moved to
Settings.price_per_1k_tokens; safety gate now consumes setting.
- Moved to
-
Type tightening in completion_mapper
- Added
PolicyAppliedalias (dict[str, str | int | float]).
- Added
-
Medium Priority (V1 Completion):
-
Promote
policy_appliedtyping to a shared domain type (CompletionEnvelope) to avoid loosedictusage across the service. -
Capability registry extraction (โ ADR-A14)
- Create
runtime_assets/registries/providers/openai.jsonc - Implement
RegistryLoaderwith JSONC support - Refactor
model_router.pyto use registry - Refactor
safety_gate.pyto use registry pricing - See: ADR-A14: File-Based Registry System
- Create
- Intent routing implementation
- Document planned approach or create follow-up issue
- Current: placeholder at model_router.py:98-101
-
Post-check safety implementation
- Add content validation logic to
safety_gate.post_check() - Current: stubbed at safety_gate.py:124-133
- Add content validation logic to
-
Low Priority (Future Work):
- Warning enum system
- Create typed warning codes instead of strings
- Affects: safety_gate, completion_mapper, model_router
- Enhanced diagnostics
- More granular routing reasons
- Detailed safety check diagnostics
- Message.content Type Architecture Investigation (design quality, non-blocking)
- Location: gen_ai_service/models/domain.py:92-96
- Issue: Sourcery identifies
Union[str, List[ChatCompletionContentPartParam]]as source of complexity - Context: Current design intentionally supports OpenAI's flexible content API (plain text OR structured parts with images/etc)
- Investigation Areas:
- Document current usage patterns across codebase
- Assess downstream complexity: where are type checks needed?
- Evaluate normalization strategies (always list? separate fields? utility methods?)
- Consider provider compatibility (Anthropic, etc)
- Draft ADR or addendum to existing GenAI ADRs if design change warranted
- Impact: Affects message representation throughout GenAIService
โธ๏ธ GenAIService Thread Safety and Rate Limiting (ADR-A15)¶
- Status: DEFERRED - Not needed for VS Code integration (process isolation)
- Priority: MEDIUM (revisit when building Python batch pipelines)
- Issue: #22
- ADR: ADR-A15: Thread Safety and Rate Limiting
- Why Deferred: VS Code extension uses process isolation (each
tnh-gencall = separate GenAIService instance). Thread safety only matters for Python-native batch pipelines. - When to Revisit: When implementing concurrent corpus processing loops or batch translation pipelines
- Estimate: 3-6 hours (Phase 1: 1-2 hours, Phase 2: 2-4 hours)
- Quick Summary: Add thread-safe retry state, locked cache, and optional rate limiting for high-throughput scenarios
Priority 2: Production Hardening (Post-Bootstrap)¶
Goal: Harden TNH Scholar for production use after VS Code integration enables AI-assisted development. Focuses on reliability, test coverage, and type safety.
๐ง OpenAI SDK 2.15.0 Validation (High Priority)¶
- Status: NOT STARTED
- Why: SDK bump impacts OpenAI adapter. (Codex harness suspended โ see ADR-OA03.2 addendum)
- Tasks:
- Revalidate OpenAI adapter request/response mappings against 2.15.0
- Update compatibility notes/docs if schema drift is found
๐ง Audio-Transcribe Service-Layer Refactor (P2)¶
- Status: NOT STARTED
- Goal: Align audio-transcribe with object-service pattern and ytt-fetch robustness.
- Tasks:
- Introduce typed service orchestrator + protocols (CLI becomes thin wrapper)
- Extract audio source resolution into a typed resolver (yt_url/CSV/local file)
- Replace dict options with Pydantic models (transcription + diarization params)
- Move logging bootstrap out of module import path so
audio-transcribemodules are import-safe in tests and sandboxed environments - Add runtime preflight (yt-dlp inspector + ffmpeg availability); keep version checks ops-only
- Migrate CLI to Typer with minimal surface (smoke tests only)
- Add service-layer tests for all audio-transcribe use cases
โธ๏ธ Agent Orchestration - Codex Runner (ADR-OA03.2)¶
- Status: TABLED (2026-01-25)
- ADR: ADR-OA03.2
- Why Tabled:
- Scope: Spike revealed that a proper Codex harness requires implementing full client-side agent orchestration (the VS Code extension uses a proprietary app server, not raw API calls)
- Cost-benefit: Current human-in-the-loop workflow with Claude Code + VS Code Codex extension is effective and cost-efficient for project needs
- No compelling need: Investment not justified when manual workflow works well
- Findings: Codex Harness Spike Findings
- Preserved Artifacts:
src/tnh_scholar/agent_orchestration/codex_harness/,src/tnh_scholar/cli_tools/tnh_codex_harness/ - Conditions for Resumption: Further insight or clear business need that justifies full agent orchestration investment
๐ง Expand Test Coverage¶
- Status: NOT STARTED
- Current Coverage: ~5% (4 test modules)
- Target: 50%+ for gen_ai_service
- Tasks:
- GenAI service flows: prompt rendering, policy resolution, provider adapters
- CLI integration tests (option parsing, environment validation)
- Configuration loading edge cases
- Error handling scenarios
- Pattern catalog validation
- Full CLI test suite with 100% coverage (HIGH PRIORITY - include all CLI tools, not just tnh-gen)
- tnh-gen CLI comprehensive coverage (HIGH PRIORITY - Missing basic command tests):
- Add tests for all
tnh-gen configcommands (show, get, set, list) - Add tests for all
tnh-gen listcommands (simple, query) - Add tests for
tnh-gen gencommand with various options - Test Path serialization in config commands (regression test for model_dump)
- Test config precedence: defaults โ user โ workspace โ CLI flags
- Test error handling for all commands
- Integration tests for full workflows
- Context: Basic command
tnh-gen config showfailed with Path serialization bug that should have been caught by tests
- Add tests for all
๐ง Consolidate Environment Loading¶
- Status: NOT STARTED
- Problem: Multiple modules call
load_dotenv()at import time - https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/ai_text_processing/prompts.py
- https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/audio_processing/diarization/pyannote_client.py
- Tasks:
- Create single startup hook for dotenv loading
- Use Pydantic Settings consistently
- Pass configuration objects instead of
os.getenv()calls - Remove import-time side effects
๐ง Configuration Tech Debt โ Migrate to ADR-CF01/CF02 Three-Layer Model¶
- Status: PHASES 1-3 COMPLETE, Phase 4-5 NOT STARTED
- Priority: MEDIUM (foundational, not blocking current work)
- ADRs:
- ADR-CF01: Runtime Context & Configuration Strategy
- ADR-CF02: Prompt Catalog Discovery Strategy (status: accepted)
- Related: ADR-A08: Config/Params/Policy Taxonomy
Migration Phases:
- Phase 1: Extend TNHContext for Prompts โ COMPLETE
- Add
PromptPathBuilderanalogous toRegistryPathBuilderโsrc/tnh_scholar/configuration/context.py:165-191 - Define three-layer prompt discovery: workspace โ user โ built-in
- Create
runtime_assets/prompts/with minimal built-in set (3 prompts +_catalog.yaml) -
Unit tests for prompt path resolution โ
tests/configuration/test_prompt_discovery.py -
Phase 2: Migrate GenAISettings โ COMPLETE
- Update
GenAISettings.prompt_dirto use lazy TNHContext resolution โsettings.py:89-102 - Legacy
TNH_DEFAULT_PROMPT_DIRconstant removed from__init__.py -
tnh-gen config_loader works with new resolution
-
Phase 3: Eliminate Module-Level Constants โ COMPLETE
-
TNH_CONFIG_DIR,TNH_LOG_DIR,TNH_DEFAULT_PROMPT_DIRremoved from__init__.py - Only structural constants remain (
TNH_ROOT_SRC_DIR,TNH_PROJECT_ROOT_DIR,TNH_CLI_TOOLS_DIR) -
No
FileNotFoundErrorraises at import time for config paths -
Phase 4: Unify Subsystem Settings (Medium Priority) โ NOT STARTED
- Audit all
BaseSettingsclasses across subsystems - Deprecate
PromptSystemSettings.tnh_prompt_dirin favor of unified approach -
Standardize env var prefixes (e.g.,
TNH_GENAI_*,TNH_AUDIO_*) -
Phase 5: Propagate tnh-gen Config Pattern (Low Priority) โ NOT STARTED
- Create shared
CLIConfigLoaderbase for all CLI tools - Add
config show/get/setsubcommands to major CLI tools - Standardize workspace config file format
Success Criteria:
- [x] No module-level config Path constants in __init__.py
- [x] Prompt path discovery flows through TNHContext
- [x] Prompt directories follow three-layer precedence (workspace โ user โ built-in)
- [ ] At least tnh-gen and audio-transcribe share config loader pattern
๐ง Clean Up CLI Tool Versions¶
- Status: PARTIAL (old versions removed, utilities pending)
- Location: cli_tools/audio_transcribe/
- Tasks:
- Remove legacy
audio_transcribe0.py - Remove audio_transcribe1.py
- Remove audio_transcribe2.py
- Keep only current version
- Create shared utilities (argument parsing, environment validation, logging)
โ Documentation Reorganization (ADR-DD01 & ADR-DD02) โ See Archive¶
Phase 1 COMPLETE - Remaining Phase 2 tasks:
- Doc metadata validation script (
check_doc_metadata.py) - validate front matter - Docstring coverage (
interrogate) - threshold onsrc/tnh_scholar - Archive index + legacy ADR migration to
docs/archive/** - Backlog: populate
docs/docs-ops/roadmap.mdwith missing topics - User guides for new features, architecture component diagrams
๐ง Type System Improvements¶
- Status: PARTIAL
- Current: 58 errors across 16 files
- High Priority: Fix audio processing boundary types, core text processing types, function redefinitions
- Medium Priority: Add missing type annotations, fix Pattern class type issues
- Low Priority: Clean up Any return types, standardize type usage
๐ง Prompt Catalog Safety¶
- Status: NOT STARTED
- Priority: HIGH (critical infrastructure)
- Problem: Adapter doesn't handle missing keys or invalid front-matter gracefully
- Tasks:
- Add manifest validation
- Implement caching
- Better error messages (unknown prompt, hash mismatch)
- Front-matter validation
- Document prompt schema
๐ง Knowledge Base Implementation¶
- Status: DESIGN COMPLETE
- ADR: ADR-K01
- Tasks:
- Implement Supabase integration
- Vector search functionality
- Query capabilities
- Semantic similarity search
๐ง Configuration & Data Layout¶
- Status: NOT STARTED
- Priority: HIGH (blocks pip install)
- Problem:
src/tnh_scholar/__init__.pyraises FileNotFoundError when repo layout missing - Tasks:
- Package pattern assets as resources
- Make patterns directory optional
- Move directory checks to CLI entry points only
- Ensure installed wheels work without patterns/ directory
๐ง Logging System Scope¶
- Location:
src/tnh_scholar/logging_config.py - Problem: Modules call setup_logging individually
- Tasks:
- Define single application bootstrap
- Document logger acquisition pattern (get_logger only)
- Create shared CLI bootstrap helper
๐ง Comprehensive CLI Reference Documentation¶
- Status: IN PROGRESS (tnh-gen complete โ , other CLIs pending)
- Tasks:
- Update user-guide examples to use tnh-gen
- Document other CLI tools (audio-transcribe, ytt-fetch, nfmt, etc.)
- Consider automation for CLI reference generation
๐ฎ Shared CLI UI Module (tnh_cli_ui)¶
- Status: NOT STARTED (Research/Exploration)
- Priority: MEDIUM (UX consistency across CLI tools)
- ADR: ADR-ST01.1: tnh-setup UI Design
- Context: The tnh-setup UI redesign (Rich library) could be extracted into a shared module for consistent styling across all tnh-scholar CLI tools.
- Research Questions:
- Survey CLI tools for shared UI patterns (headers, status indicators, progress, tables)
- Evaluate Rich vs alternatives (click-extra, questionary, etc.)
- Design minimal API surface for common operations
- Consider Typer + Rich integration patterns
- Potential Scope:
- Styled section headers with step progress
- Standardized status indicators (โ/โ /โ/โ/โข) with color vocabulary
- Spinner wrappers for async operations
- Summary table generators
- Banner/header utilities
- Affected Tools: tnh-setup, tnh-gen, ytt-fetch, audio-transcribe, nfmt, token-count, tnh-tree
๐ง Document Success Cases¶
- Status: NOT STARTED
- Goal: Document TNH Scholar's successful real-world applications
- Cases: Deer Park Cooking Course (SRTs), 1950s JVB Translation (OCR), Dharma Talk Transcriptions, Sr. Dang Nhiem's talks
- Tasks:
- Create
docs/case-studies/directory structure - Document each case with context, tools, challenges, outcomes
๐ง Notebook System Overhaul¶
- Status: NOT STARTED
- Priority: HIGH
- Goal: Transform notebooks from exploratory/testing to production-quality examples
- Tasks:
- Audit & categorize all notebooks
- Polish core example notebooks
- Convert testing notebooks to pytest
- Archive legacy notebooks with context notes
Priority 3: Future Work & Advanced Features¶
Goal: Long-term sustainability, advanced features, and nice-to-have improvements. Address after bootstrap loop is working.
๐ง Refactor Monolithic Modules¶
- Status: NOT STARTED
- Targets:
- https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/ai_text_processing/prompts.py (34KB)
- Break into: prompt model, repository manager, git helpers, lock helpers
- Add docstrings and tests for each unit
- Document front-matter schema
- https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/journal_processing/journal_process.py (28KB)
- Identify focused units
- Extract reusable components
๐ง Complete Provider Abstraction¶
- Status: NOT STARTED
- Tasks:
- Implement Anthropic adapter
- Add provider-specific error handling
- Test fallback/retry across providers
- Provider capability discovery
- Multi-provider cost optimization
๐ง Developer Experience Improvements¶
- Status: PARTIAL (hooks and Makefile exist, automation pending)
- Tasks:
- Add pre-commit hooks (Ruff, notebook prep)
- Create Makefile for common tasks (lint, test, docs, format, setup)
- Add MyPy to pre-commit hooks
- Add contribution templates (issue/PR templates)
- CONTRIBUTING.md exists and documented
- Release automation
- Changelog automation
๐ง Historical ADR Status Audit¶
- Status: NOT STARTED
- Context: 25 ADRs marked with
status: currentfrom pre-markdown-standards migration - Tasks:
- Review each ADR to determine actual status (implemented/superseded/rejected)
- Update status field in YAML frontmatter
- Cross-reference with newer ADRs for superseded decisions
๐ง Package API Definition¶
- Status: Deferred during prototyping
- Tasks:
- Review and document all intended public exports
- Implement
__all__in key__init__.pyfiles - Verify exports match documentation
๐ง Repo Hygiene¶
- Problem: Generated artifacts in repo (build/, dist/, site/, *.txt)
- Tasks:
- Add to .gitignore
- Document regeneration process
- Rely on release pipelines for builds
๐ง Notebook & Research Management¶
- Location: notebooks/, docs/research/
- Problem: Valuable but not curated exploratory work
- Tasks:
- Adopt naming/linting convention
- Publish vetted analyses to docs/research via nbconvert
- Archive obsolete notebooks
Recently Completed Tasks (Archive)¶
tnh-gen CLI Implementation โ ¶
- Completed: 2025-12-27
- ADR: ADR-TG01, ADR-TG01.1
- What: Protocol-driven CLI replacing tnh-fab, dual modes (human-friendly default,
--apifor machine consumption) - Documentation: tnh-gen CLI Reference (661 lines)
File-Based Registry System (ADR-A14) โ ¶
- Completed: 2026-01-01 (PR #24)
- ADR: ADR-A14, ADR-A14.1
- What: JSONC-based registry with multi-tier pricing, TNHContext path resolution, staleness detection
- Key Deliverables:
openai.jsoncregistry,RegistryLoader, Pydantic schemas, JSON Schema for VS Code, refactoredmodel_router.pyandsafety_gate.py, 264 tests passing
VS Code Extension Walking Skeleton โ ¶
- Completed: 2026-01-07
- ADR: ADR-VSC01, ADR-VSC02
- What: TypeScript extension enabling "Run Prompt on Active File" workflow
- Capabilities: QuickPick prompt selector, dynamic variable input,
tnh-gen runsubprocess execution, split-pane output, unit/integration tests - Validation: Proves bootstrapping concept - extension ready to accelerate TNH Scholar development
PatternโPrompt Migration โ ¶
- Completed: 2026-01-19
- ADR: ADR-PT04
- What: PatternโPrompt terminology migration and directory restructuring
- Key Changes:
patterns/โprompts/(standalonetnh-promptsrepo),TNH_PATTERN_DIRโTNH_PROMPT_DIR, removed legacytnh-fabCLI - Breaking:
TNH_PATTERN_DIRenv var removed,tnh-fabCLI removed
Provenance Format Refactor โ ¶
- Completed: 2026-01-19
- ADR: ADR-TG01 Addendum 2025-12-28
- What: Switched tnh-gen from HTML comments to YAML frontmatter for provenance metadata
- Files Modified:
provenance.py,test_tnh_gen.py,tnh-gen.md
OpenAI Client Unification โ ¶
- Completed: 2025-12-10
- ADR: ADR-A13
- What: Migrated from legacy
openai_interface/to moderngen_ai_service/providers/architecture (6 phases)
Core Stubs Implementation โ ¶
- Completed: 2025-12-10
- What: Implemented params_policy, model_router, safety_gate, completion_mapper with strong typing
- Grade: A- (92/100) - Production ready with minor polish
Documentation Reorganization Phase 1 โ ¶
- Completed: 2025-12-05
- ADR: ADR-DD01, ADR-DD02
- What: Absolute links, MkDocs strict mode, filesystem-driven nav, lychee link checking
Packaging & CI Infrastructure โ ¶
- Completed: 2025-11-20
- What: pytest in CI, runtime dependencies declared, pre-commit hooks, Makefile targets
Remove Library sys.exit() Calls โ ¶
- Completed: 2025-11-15
- What: Library code raises ConfigurationError instead of exiting process
Convert Documentation Links to Absolute Paths โ ¶
- Completed: 2025-12-05 (PR #14)
- What: Converted 964 links to absolute paths, enabled MkDocs strict link validation, integrated link verification
NumberedText Section Boundary Validation โ ¶
- Completed: 2025-12-12
- ADR: ADR-AT03.2 (status: accepted โ should be implemented)
- What: Implemented
validate_section_boundaries()andget_coverage_report()methods for robust section management - Commits: cf99375 (docs), 798a552 (refactor unused methods)
TextObject Robustness Improvements โ ¶
- Completed: 2025-12-14
- ADR: ADR-AT03.3 (status: accepted โ should be implemented)
- What: Implemented merge_metadata() with MergeStrategy enum, validate_sections() with fail-fast, converted to Pydantic v2, added structured exception hierarchy
- Commits: 096e528 (implementation), 03654fe (../../docstrings)