TNH Scholar TODO List¶

Roadmap tracking the highest-priority TNH Scholar tasks and release blockers.

Last Updated: 2026-04-26 (CI workflow cleanup and docs validation split landed; notebook repo cleanup and release validation next) Version: 0.4.2 (Alpha) Status: Active Development - bootstrap viable, release validation and packaging phase

Style Note: Tasks use descriptive headers (not numbered items) to avoid renumbering churn when reorganizing. Use #### (h4) for task headers within priority sections.

Progress Summary¶

Bootstrap Path Status: ✅ COMPLETE — VS Code integration working, AI-assisted development enabled.

Agent-Orch Bootstrap Status: ✅ USABLE PROTOTYPE — the maintained tnh-conductor path has now produced and landed a bounded repo-native implementation through the worktree-backed headless path.

Next Steps:

🚧 Prepare 0.4.0 bootstrap release framing for tnh-conductor prototype alpha
🚧 Run 0.4.0 release validation pass and finalize version-bump scope/release notes
🚧 Reduce notebook/test clutter before release: archive junk notebooks and move real test coverage into pytest where practical
🔮 JVB VS Code Parallel Viewer (P1, design phase) — ADR-JVB02 strategy + UI-UX design
🔮 Finish yt-dlp reliability suite + monthly ops trigger (P1, reliability)
🔮 Finish ytt-fetch robustness hardening (P1, reliability)
🚧 GenAIService Final Polish - promote policy_applied typing (P1, minor)
🚧 Prompt Catalog Safety - manifest validation + schema docs (P2, critical infrastructure)
🚧 Knowledge Base Implementation (P2, design complete)
🚧 Expand Test Coverage with refreshed baseline and current gaps (P2)
🚧 OpenAI registry-driven request profiles for model-specific controls (P2, follow-up hardening)

Recent release hardening completed:

✅ CI workflow cleanup split GitHub validation into PR, main, scheduled full-test, and docs-specific flows; docs PR validation is now read-only
✅ Legacy directory-tree generation/drift checks were removed from routine CI and local validation; tnh-tree remains available only as a manual developer utility
✅ Added make branch-preflight so new work can start from a clean branch based on current origin/main
✅ Docs tooling aligned to restore stable API docs builds under the current CI/read-only MkDocs path
✅ Final Dependabot remediation slice removed prototype-only LangChain packages from the root Poetry graph, patched yt_dlp in the standalone audio env, and refreshed the remaining vulnerable manifests
✅ Removed dead query/v2_cleaning_scripts.py artifact and dropped its unused transformers dependency
✅ Dependabot Stage 6 retired the legacy local Whisper / torch audio path so maintained audio support is limited to current API-backed surfaces
✅ Dependabot Stage 5 optional GUI graph refresh landed for Poetry-managed langchain, langchain-community, langchain-core, langchain-text-splitters, and aiohttp
✅ pattern_share app explicitly marked as an exploratory legacy prompt-sharing prototype with prompt terminology restored in the UI
✅ Dependabot Stage 4 pattern_share manifest refresh landed for streamlit, langchain, and langchain-community
✅ Dependabot Stage 3 dev/tooling refresh landed for flask, jinja2, black, pytest, and patched notebook/tooling transitive packages
✅ Repo-wide Ruff backlog reduced to zero; make lint now passes
✅ Repo-wide mypy backlog reduced to zero; make type-check now passes

For completed items: See Archive section at end.

Priority Roadmap¶

This section organizes work into three priority levels based on criticality for production readiness.

Priority 1: VS Code Integration Enablement (Bootstrap Path)¶

Goal: Enable AI-assisted development of TNH Scholar itself via VS Code extension. Prioritizes foundational work for tnh-gen + extension integration.

Status: Foundation Complete (tnh-gen CLI ✅, Registry System ✅)

✅ tnh-gen CLI Implementation — See Archive¶

✅ File-Based Registry System (ADR-A14) — See Archive¶

✅ VS Code Extension Walking Skeleton — See Archive¶

✅ Pattern→Prompt Migration — See Archive¶

✅ Provenance Format Refactor (YAML Frontmatter) — See Archive¶

🚨 Agent-Orch OA07 Runtime Implementation Sequence¶

Status: IN PROGRESS - maintained execution/validation/kernel slice landed and tested
Priority: HIGH (foundation work for durable MVP)
Context: The accepted OA07 ADR set defines the maintained runtime architecture. The current conductor_mvp/ and spike/ code remains useful as migration source/reference, but should not receive forward-path feature growth.
Why This Matters:
current implementation readiness is medium, but in-place extension readiness is low
the highest-risk boundary is still subprocess execution and typed validation/runner contracts
coding should proceed by subsystem extraction, not by continuing prototype package growth
Implementation Order:
Build agent_orchestration/execution/
- typed invocation families
- cwd/env/timeout policy
- termination/result taxonomy
- final argv rendering boundary
Build agent_orchestration/validation/ on top of execution/
- preserve OA04 external YAML compatibility by normalizing source shapes into typed internal models
- migrate behavior out of conductor_mvp/providers/validation_runner.py
Extract agent_orchestration/kernel/
- WorkflowCatalog
- WorkflowValidator
- KernelState
- KernelRunService
Introduce agent_orchestration/workspace/ and agent_orchestration/run_artifacts/
- move rollback/state capture and durable run record ownership out of prototype packages
Migrate maintained runner behavior into agent_orchestration/runners/
- use reference/spike/ only as reference material
- no new forward-path runner work in spike code
Current Slice Completed:
Added maintained execution/, validation/, kernel/, workspace/, run_artifacts/, and runners/ package scaffolding
Added focused OA07 regression coverage and validated the new slice plus legacy conductor_mvp kernel tests
Sourcery installed successfully via poetry install --with local, but the CLI currently hangs even for --help, so local Sourcery review remains blocked by Sourcery runtime behavior rather than repo config
Migration Rules:
Do not add substantive new feature work to conductor_mvp/
Do not add new forward-path implementation work to spike/
Treat conductor_mvp/ as a temporary migration-source package to be deleted after subsystem extraction
Treat codex_harness/ and spike/ as reference packages during OA07 migration
Initial Files in Scope:
src/tnh_scholar/agent_orchestration/conductor_mvp/
src/tnh_scholar/agent_orchestration/spike/
src/tnh_scholar/agent_orchestration/common/
tests/agent_orchestration/
docs/architecture/agent-orchestration/adr/adr-oa03-agent-runner-architecture.md
docs/architecture/agent-orchestration/adr/adr-oa03.1-claude-code-runner.md
docs/architecture/agent-orchestration/adr/adr-oa03.3-codex-cli-runner.md
docs/architecture/agent-orchestration/adr/adr-oa04-workflow-schema-opcode-semantics.md
docs/architecture/agent-orchestration/adr/adr-oa04.1-implementation-notes-mvp-buildout.md

🚨 OA07.1 Bootstrap Worktree Slice¶

✅ OA04 Contract Family — PR Sequence (Complete)¶

Status: COMPLETE — contract ADRs implemented in maintained code; bootstrap remains blocked on OA07.1 worktree execution
Context: OA04.2–OA04.5 are the contract-layer ADRs between the OA07 runtime foundations and the maintained runner/policy/provenance implementations. That contract family is now landed in code and should no longer be treated as pending. See implementation notes in ADR-OA04.1 Addendum 2026-03-27 for the original scaffolding gaps and ADR-OA04.1 Addendum 2026-04-05 for the bootstrap-first reprioritization.
Dependency chain:
OA04.3 (run dir + manifests + evaluator evidence seam) → OA04.2 (runners normalize into canonical evidence)
OA04.4 (policy taxonomy + requested/effective split) → OA04.2 (runner request carries typed requested policy)
OA04.5 (harness backend) → validation/ subsystem (extends empty package)
OA04.2 (runner adapters) → milestone: first real agent invocations
Implementation Notes (default choices for implementers):
Apply OS01 pragmatically: add structure where it protects a real boundary or likely evolution seam, not just to mirror the taxonomy mechanically.
Prefer moving maintained code toward the ADR contracts when the migration path is clean; do not preserve stub shapes just for short-term compatibility inside maintained packages.
Treat run_artifacts/ as the canonical evidence boundary. If a choice arises between storing data in runner-local files versus canonical artifact roles + manifests, choose canonical artifact roles + manifests.
Keep evaluator assembly strict: evaluators read metadata.json, events.ndjson, manifest.json, and canonical artifact roles only. Do not add evaluator dependencies on adapter-local raw capture filenames.
Keep manifests thin and stable. Put compact cross-step evidence in evidence_summary; put detailed per-step policy data in canonical policy_summary.json.
Keep persistence ownership in run_artifacts/. Runner adapters and validation backends should return typed normalized outputs and artifact payloads; they should not own final manifest writing policy.
Evolve existing maintained code where it already matches the target shape. In particular, refactor validation/service.py toward the script backend/resolver seam rather than replacing it wholesale.
Expand kernel/service.py by extraction, not accretion. If per-step provenance writing starts to crowd the kernel, extract focused collaborators rather than growing one large procedural service.
Use explicit mapper/normalizer classes whenever native CLI or harness output is translated into maintained models. Do not hide parsing, normalization, termination mapping, and persistence decisions in one adapter class.
Keep policy taxonomy aligned with OS01: init-time settings/config, per-step requested policy, execution-time effective policy, persisted PolicySummary. Avoid “policy blob” models that mix those concerns.
Do not add ceremony without benefit: avoid speculative service/factory layers, unnecessary mappers for nearly identical shapes, or package splits that do not improve testability, replaceability, or clarity.
Existing thin models in run_artifacts/, runners/, and validation/ are scaffolding, not target architecture. It is acceptable to break those internal shapes in favor of cleaner maintained contracts during this implementation sequence.
PR Sequence:
PR-1 feat/oa04-contract-adrs — ADR acceptance (docs only)
- Commit new OA04.2, OA04.3, OA04.4, OA04.5 files; later implementation has since moved those decimal ADRs to implemented
- Carry in already-modified OA03.1/OA03.3 addendums + OA04 update + index.md
PR-2 feat/oa04.3-run-artifact-contract — Run-artifact domain contract + store (medium)
- Expand run_artifacts/models.py: RunMetadata, RunEventRecord, ArtifactRole enum, StepArtifactEntry, StepManifest
- Add manifest-level evidence_summary with compact canonical evidence references
- Add canonical policy_summary artifact role for detailed requested/effective policy records
- Expand run_artifacts/protocols.py: write_step_manifest, artifact_step_dir, canonical artifact persistence APIs
- Update run_artifacts/filesystem_store.py to implement both
- Keep filesystem concerns behind the store; no evaluator-facing filename dependencies
- Tests for manifest writing, event stream fields, and canonical artifact-role lookup
PR-3 feat/oa04.3-kernel-provenance-integration — Kernel provenance integration (medium)
- Update kernel/runtime services to write enriched run metadata, canonical events, and per-step manifests
- Persist compact manifest summaries and canonical artifact references only; no adapter-local evidence lookup in evaluator assembly
- Capture workspace diff/status and policy summary references through canonical artifact roles
- Tests for manifest/event creation across RUN_AGENT, RUN_VALIDATION, EVALUATE, and GATE
- Depends on PR-2
PR-4 feat/oa04.4-policy-contract — Execution policy package (medium)
- New agent_orchestration/execution_policy/ package
- models.py: ExecutionPolicySettings, RequestedExecutionPolicy, EffectiveExecutionPolicy, PolicyViolationClass, PolicyViolation, PolicySummary
- assembly.py: ExecutionPolicyAssembler for system settings → workflow → step requested policy → runtime override/effective policy derivation
- protocols.py: ExecutionPolicyAssemblerProtocol
- Update runners/models.py: retire PromptInteractionPolicy stub; link RunnerTaskRequest to RequestedExecutionPolicy
- Persist detailed policy_summary.json via canonical policy_summary artifact role; keep only compact summary data in manifests
- Tests for assembly precedence, requested/effective policy derivation, and hard-fail behavior
- Can run in parallel with PR-3
PR-5 feat/oa04.2-runner-adapters — Runner adapters (largest PR)
- Expand runners/models.py: AdapterCapabilities (capability declaration per OA04.2 §3a)
- Add explicit mapper/normalizer classes for native CLI output → maintained runner-domain models
- Add runners/adapters/claude_cli.py: claude --print --output-format stream-json --permission-mode dontAsk, stream-json parsing, normalization, termination mapping
- Add runners/adapters/codex_cli.py: codex exec --json --output-last-message, JSONL capture, normalization, termination mapping
- Adapters return typed normalized artifact payloads; canonical persistence is owned by run_artifacts
- Evaluators consume manifests and canonical artifact roles only, never runner-local raw capture files
- Tests for both adapters (subprocess mocking, normalization, mapper behavior, termination paths)
- Depends on PR-2, PR-3, and PR-4
PR-6 feat/oa04.5-harness-backend — Script harness backend (medium)
- Build out agent_orchestration/validation/: BackendFamily enum, HarnessBackendRequest, HarnessBackendResult, HarnessBackendProtocol
- backends/script.py: migrate from conductor_mvp/providers/validation_runner.py; normalize to validation_report/validation_stdout/validation_stderr artifact roles
- Add backend resolver seam, but defer cli and web implementation until a concrete maintained consumer exists
- Tests for script backend, resolver seam, and artifact role normalization
- Depends on PR-2; independent of PR-4 and PR-5

🔮 JVB VS Code Parallel Viewer (ADR-JVB02)¶

Status: NOT STARTED (Design Phase)
Priority: HIGH (flagship feature, builds on VS Code integration foundation)
Context: The JVB (Journal of Vietnamese Buddhism) parallel viewer enables scholars to view scanned historical journal pages alongside OCR text and English translations. v1 was a bespoke browser-based prototype; v2 will integrate into the tnh-scholar VS Code extension.
Project Paused: This work was on hold while VS Code integration and tnh-gen were developed. Now that the walking skeleton is complete, we can resume with fresh design.

Related Documentation:

v1 As-Built: ADR-JVB01 — Browser-based prototype architecture
v2 Strategy (Draft): JVB Viewer V2 Strategy — Pre-ADR strategy note (good foundations, needs formalization)
VS Code Platform Strategy: VS Code as UI Platform — Overall UI-UX direction
VS Code Integration: ADR-VSC01 — CLI-first extension strategy (implemented)

Proposed ADR Structure:

docs/architecture/jvb-viewer/adr/
├── adr-jvb01_as-built_jvb_viewer_v1.md              # ✅ Exists
├── adr-jvb02-vscode-parallel-viewer-strategy.md     # 🆕 Main strategy ADR
├── adr-jvb02.1-ui-ux-design.md                      # 🆕 Mockups, pane layout, workflows
├── adr-jvb02.2-data-model-api-contract.md           # 🆕 JSON schema, extension↔backend API
└── adr-jvb02.3-implementation-guide.md              # 🆕 Phase-by-phase implementation

Key Design Decisions Needed:

VS Code Pane Architecture: Which panes for scan overlay, text views, reconciliation controls, navigation?
Webview vs Custom Editor: Custom editor for .jvb.json files or webview panel approach?
Backend Integration: Python service via CLI (tnh-gen patterns) or dedicated HTTP service?
Data Model: Refine per-page JSON schema from v2 strategy, define API contract
Dual OCR Reconciliation UI: How users choose between Google OCR vs AI vision sources

Deliverables:

ADR-JVB02: Main strategy ADR (formalize v2 strategy, VS Code integration focus)
ADR-JVB02.1: UI-UX design with mockups/screen visualizations
ADR-JVB02.2: Data model and API contract specification
ADR-JVB02.3: Implementation guide with milestones
M0 Prototype: Static HTML mockup in VS Code webview (validate approach)

Implementation Milestones (from v2 strategy, to be refined):

M0: Static prototype — HTML showing page image, word bboxes, selectable sentences
M1: VS Code extension — load/save per-page JSON, overlay modes, section breadcrumb
M2: Dual-source UI — GOCR vs AI diff chooser, batch adoption, "reviewed" status
M3: Structure cues — columns, heading levels, emphasis flags captured and rendered
M4: Beta — section-level navigation, export HTML, light theming

✅ Add `--prompt-dir` Global Flag to tnh-gen — Completed 2026-04-18¶

Status: COMPLETE
Priority: HIGH (improves tnh-gen UX for one-off operations and testing)
Estimate: 1-2 hours
Context: Users need convenient way to override prompt catalog directory for one-off CLI calls without setting environment variables or creating temp config files
ADR: ADR-TG01 Addendum 2026-01-02
Why Important: Enables clean one-off operations (tnh-gen --prompt-dir ./test-prompts list) for testing, CI/CD, and development workflows
Current Workarounds:
Environment variable: TNH_PROMPT_DIR=/path tnh-gen list (awkward)
Temp config file: tnh-gen --config /tmp/config.yaml list (verbose)
Deliverables:
Add --prompt-dir flag to cli_callback() in src/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py:26
Update config_loader.py to handle prompt directory override at CLI precedence level
Update ConfigData type to accept prompt_catalog_dir override
Add unit tests for flag precedence (CLI flag > workspace > user > env)
Update help text and CLI reference documentation
Update docs/cli-reference/tnh-gen.md global flags section
Files to Modify:
src/tnh_scholar/cli_tools/tnh_gen/tnh_gen.py (add flag)
src/tnh_scholar/cli_tools/tnh_gen/config_loader.py (precedence handling)
src/tnh_scholar/cli_tools/tnh_gen/types.py (type definitions)
tests/cli_tools/test_tnh_gen.py (unit tests)
docs/cli-reference/tnh-gen.md (documentation)
Testing: Verify --prompt-dir flag overrides all other config sources (workspace, user, env)

🔮 Full-Coverage yt-dlp Test Suite + Monthly Ops Trigger¶

🔮 Patch ytt-fetch Robustness¶

Status: IN PROGRESS
Priority: HIGH (frequent breakage path)
Goal: Make ytt-fetch resilient to upstream changes and failures.
Test URL: https://youtu.be/iqNzfK4_meQ
Deliverables:
Add runtime preflight + yt-dlp runtime option injection
Verify transcript fetch on test URL (manual + test)
Add retries / improved error reporting
Ensure metadata embed + output path handling remain stable
Update docs and CLI reference if flags or behaviors change

🚧 GenAIService Core Components - Final Polish¶

⏸️ GenAIService Thread Safety and Rate Limiting (ADR-A15)¶

Status: DEFERRED - Not needed for VS Code integration (process isolation)
Priority: MEDIUM (revisit when building Python batch pipelines)
Issue: #22
ADR: ADR-A15: Thread Safety and Rate Limiting
Why Deferred: VS Code extension uses process isolation (each tnh-gen call = separate GenAIService instance). Thread safety only matters for Python-native batch pipelines.
When to Revisit: When implementing concurrent corpus processing loops or batch translation pipelines
Estimate: 3-6 hours (Phase 1: 1-2 hours, Phase 2: 2-4 hours)
Quick Summary: Add thread-safe retry state, locked cache, and optional rate limiting for high-throughput scenarios

Priority 2: Production Hardening (Post-Bootstrap)¶

Goal: Harden TNH Scholar for production use after VS Code integration enables AI-assisted development. Focuses on reliability, test coverage, and type safety.

🚧 OpenAI SDK 2.15.0 Validation (High Priority)¶

Status: NOT STARTED
Why: SDK bump impacts OpenAI adapter. (Codex harness suspended — see ADR-OA03.2 addendum)
Tasks:
Revalidate OpenAI adapter request/response mappings against 2.15.0
Update compatibility notes/docs if schema drift is found

🚧 Audio-Transcribe Service-Layer Refactor (P2)¶

Status: NOT STARTED
Goal: Align audio-transcribe with object-service pattern and ytt-fetch robustness.
Tasks:
Introduce typed service orchestrator + protocols (CLI becomes thin wrapper)
Extract audio source resolution into a typed resolver (yt_url/CSV/local file)
Replace dict options with Pydantic models (transcription + diarization params)
Move logging bootstrap out of module import path so audio-transcribe modules are import-safe in tests and sandboxed environments
Add runtime preflight (yt-dlp inspector + ffmpeg availability); keep version checks ops-only
Migrate CLI to Typer with minimal surface (smoke tests only)
Add service-layer tests for all audio-transcribe use cases

🚧 Fully Usable audio-transcribe Hardening Path (P2)¶

Status: NOT STARTED
Goal: Make audio-transcribe safe for routine real-world use by eliminating stack dumps from expected fault paths and hardening the tool against edge-case and provider-shape failures.
Tasks:
Build a full-spectrum regression matrix covering local files, YouTube URLs, CSV input, diarization on/off, Whisper, and AssemblyAI
Add end-to-end CLI regression coverage for normal runs plus representative operator workflows used in practice
Add fault-injection coverage for provider shape drift, empty/partial transcript results, chunk-level failures, file-write failures, temp-dir cleanup failures, and missing dependency/runtime preconditions
Ensure expected runtime faults exit with user-facing errors instead of Python tracebacks
Capture and preserve real-world reproductions as named regression fixtures/cases whenever a live failure is found
Run a post-hardening soak pass against representative real inputs before the next minor release

⏸️ Agent Orchestration - Codex Runner (ADR-OA03.2)¶

Status: TABLED (2026-01-25)
ADR: ADR-OA03.2
Why Tabled:
Scope: Spike revealed that a proper Codex harness requires implementing full client-side agent orchestration (the VS Code extension uses a proprietary app server, not raw API calls)
Cost-benefit: Current human-in-the-loop workflow with Claude Code + VS Code Codex extension is effective and cost-efficient for project needs
No compelling need: Investment not justified when manual workflow works well
Findings: Codex Harness Spike Findings
Preserved Artifacts: src/tnh_scholar/agent_orchestration/codex_harness/, src/tnh_scholar/cli_tools/tnh_codex_harness/
Conditions for Resumption: Further insight or clear business need that justifies full agent orchestration investment

🚧 Expand Test Coverage¶

Status: NOT STARTED
Current Coverage: Refresh baseline before planning. The old "~5% / 4 test modules" snapshot is obsolete.
Target: 50%+ for gen_ai_service
Tasks:
GenAI service flows: prompt rendering, policy resolution, provider adapters
CLI integration tests (option parsing, environment validation)
Configuration loading edge cases
Error handling scenarios
Pattern catalog validation
Full CLI test suite with refreshed priorities
- focus on active CLI surfaces still lacking strong coverage, not already-hardened tnh-gen basics
- prioritize tnh-conductor, audio-transcribe, ytt-fetch, and edge-case regression coverage

🚧 Consolidate Environment Loading¶

Status: NOT STARTED
Problem: Multiple modules call load_dotenv() at import time
https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/ai_text_processing/prompts.py
https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/audio_processing/diarization/pyannote_client.py
Tasks:
Create single startup hook for dotenv loading
Use Pydantic Settings consistently
Pass configuration objects instead of os.getenv() calls
Remove import-time side effects

🚧 Configuration Tech Debt — Migrate to ADR-CF01/CF02 Three-Layer Model¶

Status: PHASES 1-3 COMPLETE, Phase 4-5 NOT STARTED
Priority: MEDIUM (foundational, not blocking current work)
ADRs:
ADR-CF01: Runtime Context & Configuration Strategy
ADR-CF02: Prompt Catalog Discovery Strategy (status: accepted)
Related: ADR-A08: Config/Params/Policy Taxonomy

Migration Phases:

Success Criteria: - [x] No module-level config Path constants in __init__.py - [x] Prompt path discovery flows through TNHContext - [x] Prompt directories follow three-layer precedence (workspace → user → built-in) - [ ] At least tnh-gen and audio-transcribe share config loader pattern

🚧 Clean Up CLI Tool Versions¶

Status: PARTIAL (old versions removed, utilities pending)
Location: cli_tools/audio_transcribe/
Tasks:
Remove legacy audio_transcribe0.py
Remove audio_transcribe1.py
Remove audio_transcribe2.py
Keep only current version
Create shared utilities (argument parsing, environment validation, logging)

✅ Documentation Reorganization (ADR-DD01 & ADR-DD02) — See Archive¶

Phase 1 COMPLETE - Remaining Phase 2 tasks:

Doc metadata validation script (check_doc_metadata.py) - validate front matter
Docstring coverage (interrogate) - threshold on src/tnh_scholar
Archive index + legacy ADR migration to docs/archive/**
Backlog: populate docs/docs-ops/roadmap.md with missing topics
User guides for new features, architecture component diagrams

🚧 Type System Improvements¶

Status: PARTIAL
Current: 58 errors across 16 files
High Priority: Fix audio processing boundary types, core text processing types, function redefinitions
Medium Priority: Add missing type annotations, fix Pattern class type issues
Low Priority: Clean up Any return types, standardize type usage

🚧 Prompt Catalog Safety¶

🚧 tnh-gen Operator UX¶

Status: NOT STARTED
Priority: LOW–MEDIUM
Problem: tnh-gen provides no feedback while the model is working and does not save run output automatically, creating a poor experience for interactive and long-running calls
Tasks:
Add heartbeat / progress indicator to stderr during model calls so operators know the run is alive (especially for long documents — 10–30 s wait with no output)
Add basic run logging: log prompt key, model, input path, and elapsed time at completion even in non---api mode
Persist tnh-gen run output by default to a temp or run-artifact directory when no --output-file is provided

🚧 tnh-gen Review Context Ingestion¶

Status: NOT STARTED
Priority: MEDIUM
Problem: tnh-gen can run ad hoc review prompts via --prompt-dir, but it cannot yet gather bounded local document context on its own for review workflows such as docs language audits
Tasks:
Add repeatable local context inputs for tnh-gen run (for example --context-file or --context-dir)
Support bounded repo-local file loading for review prompts with explicit source allowlists
Emit included context sources in provenance and API output
Document a standard review-workflow pattern for docs, ADR, and architecture audits
Add follow-on conversation support for tnh-gen review/generation runs so a prompt can continue from prior output or thread state

🚧 Knowledge Base Implementation¶

Status: DESIGN COMPLETE
ADR: ADR-K01
Tasks:
Implement Supabase integration
Vector search functionality
Query capabilities
Semantic similarity search

🚧 Configuration & Data Layout¶

Status: NOT STARTED
Priority: HIGH (blocks pip install)
Problem: packaging and installed-wheel validation still need cleanup around prompt assets and repo-layout assumptions
Tasks:
Package prompt assets as resources where needed
Verify installed wheels work without repo-local prompt directories
Keep repo-layout assumptions out of import-time package initialization
Audit CLI entry points for any remaining repo-root-only assumptions

🚧 Repo-Root Docs Generation and CI Consistency¶

Status: NOT STARTED
Priority: HIGH
Problem: Documentation standards and generated docs link to /project/repo-root/*, but those generated files currently live under ignored paths and may be absent in clean remote CI checkouts, causing MkDocs/link-validation inconsistencies between local and GitHub builds
Tasks:
Decide whether docs/project/repo-root/ outputs should be tracked in git or generated in all CI docs-validation paths before link checks run
Align .gitignore, docs build scripts, and CI expectations so local and remote docs validation see the same repo-root docs set
Verify make docs-build, PR docs validation, and GitHub Actions all succeed from a clean checkout with no pre-existing generated repo-root docs
Document the intended contract for repo-root doc mirrors in docs ops guidance
Review upcoming MkDocs 2.0 / Material compatibility risk and define an upgrade stance before docs-tooling version changes are taken

🚧 Logging System Scope¶

Location: src/tnh_scholar/logging_config.py
Problem: Modules call setup_logging individually
Tasks:
Define single application bootstrap
Document logger acquisition pattern (get_logger only)
Create shared CLI bootstrap helper

🚧 Comprehensive CLI Reference Documentation¶

Status: IN PROGRESS (tnh-gen complete, tnh-conductor now documented, other CLIs still uneven)
Tasks:
Update user-guide examples to use tnh-gen
Document other CLI tools with maintained/operator-facing scope (audio-transcribe, ytt-fetch, nfmt, etc.)
Consider automation for CLI reference generation

🔮 Shared CLI UI Module (tnh_cli_ui)¶

Status: NOT STARTED (Research/Exploration)
Priority: MEDIUM (UX consistency across CLI tools)
ADR: ADR-ST01.1: tnh-setup UI Design
Context: The tnh-setup UI redesign (Rich library) could be extracted into a shared module for consistent styling across all tnh-scholar CLI tools.
Research Questions:
Survey CLI tools for shared UI patterns (headers, status indicators, progress, tables)
Evaluate Rich vs alternatives (click-extra, questionary, etc.)
Design minimal API surface for common operations
Consider Typer + Rich integration patterns
Potential Scope:
Styled section headers with step progress
Standardized status indicators (✓/⚠/✗/○/•) with color vocabulary
Spinner wrappers for async operations
Summary table generators
Banner/header utilities
Affected Tools: tnh-setup, tnh-gen, ytt-fetch, audio-transcribe, nfmt, token-count, tnh-tree

🚧 Document Success Cases¶

Status: NOT STARTED
Goal: Document TNH Scholar's successful real-world applications
Cases: Deer Park Cooking Course (SRTs), 1950s JVB Translation (OCR), Dharma Talk Transcriptions, Sr. Dang Nhiem's talks
Tasks:
Create docs/case-studies/ directory structure
Document each case with context, tools, challenges, outcomes

🚧 Notebook System Overhaul¶

Status: NOT STARTED
Priority: HIGH
Goal: Ship a cleaner release repo by reducing notebook clutter and keeping only intentional examples/research assets
Tasks:
Audit & categorize all notebooks
Remove or archive junk/testing notebooks that no longer justify repo overhead
Convert notebook-discovered tests into pytest where the behavior still matters
Keep only core example/research notebooks that are intentional release artifacts
Add context notes for archived notebooks that still have historical value

Priority 3: Future Work & Advanced Features¶

Goal: Long-term sustainability, advanced features, and nice-to-have improvements. Address after bootstrap loop is working.

🚧 Refactor Monolithic Modules¶

Status: NOT STARTED
Targets:
https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/ai_text_processing/prompts.py (34KB)
- Break into: prompt model, repository manager, git helpers, lock helpers
- Add docstrings and tests for each unit
- Document front-matter schema
https://github.com/aaronksolomon/tnh-scholar/blob/main/src/tnh_scholar/journal_processing/journal_process.py (28KB)
- Identify focused units
- Extract reusable components

🚧 Complete Provider Abstraction¶

Status: NOT STARTED
Tasks:
Implement Anthropic adapter
Add provider-specific error handling
Test fallback/retry across providers
Provider capability discovery
Multi-provider cost optimization

🚧 Developer Experience Improvements¶

Status: PARTIAL (hooks and Makefile exist, automation pending)
Tasks:
Add pre-commit hooks (Ruff, notebook prep)
Create Makefile for common tasks (lint, test, docs, format, setup)
Add MyPy to pre-commit hooks
Add contribution templates (issue/PR templates)
CONTRIBUTING.md exists and documented
Release automation
Changelog automation

🚧 Historical ADR Status Audit¶

Status: NOT STARTED
Context: 25 ADRs marked with status: current from pre-markdown-standards migration
Tasks:
Review each ADR to determine actual status (implemented/superseded/rejected)
Update status field in YAML frontmatter
Cross-reference with newer ADRs for superseded decisions

🚧 Package API Definition¶

Status: Deferred during prototyping
Tasks:
Review and document all intended public exports
Implement __all__ in key __init__.py files
Verify exports match documentation

🚧 Repo Hygiene¶

Problem: Generated artifacts in repo (build/, dist/, site/, *.txt)
Tasks:
Add to .gitignore
Document regeneration process
Rely on release pipelines for builds

🚧 Notebook & Research Management¶

Location: notebooks/, docs/research/
Problem: Valuable but not curated exploratory work
Tasks:
Adopt naming/linting convention
Publish vetted analyses to docs/research via nbconvert
Archive obsolete notebooks

Recently Completed Tasks (Archive)¶

tnh-gen CLI Implementation ✅¶

Completed: 2025-12-27
ADR: ADR-TG01, ADR-TG01.1
What: Protocol-driven CLI replacing tnh-fab, dual modes (human-friendly default, --api for machine consumption)
Documentation: tnh-gen CLI Reference (661 lines)

File-Based Registry System (ADR-A14) ✅¶

Completed: 2026-01-01 (PR #24)
ADR: ADR-A14, ADR-A14.1
What: JSONC-based registry with multi-tier pricing, TNHContext path resolution, staleness detection
Key Deliverables: openai.jsonc registry, RegistryLoader, Pydantic schemas, JSON Schema for VS Code, refactored model_router.py and safety_gate.py, 264 tests passing

VS Code Extension Walking Skeleton ✅¶

Completed: 2026-01-07
ADR: ADR-VSC01, ADR-VSC02
What: TypeScript extension enabling "Run Prompt on Active File" workflow
Capabilities: QuickPick prompt selector, dynamic variable input, tnh-gen run subprocess execution, split-pane output, unit/integration tests
Validation: Proves bootstrapping concept - extension ready to accelerate TNH Scholar development

Pattern→Prompt Migration ✅¶

Completed: 2026-01-19
ADR: ADR-PT04
What: Pattern→Prompt terminology migration and directory restructuring
Key Changes: patterns/ → prompts/ (standalone tnh-prompts repo), TNH_PATTERN_DIR → TNH_PROMPT_DIR, removed legacy tnh-fab CLI
Breaking: TNH_PATTERN_DIR env var removed, tnh-fab CLI removed

Provenance Format Refactor ✅¶

Completed: 2026-01-19
ADR: ADR-TG01 Addendum 2025-12-28
What: Switched tnh-gen from HTML comments to YAML frontmatter for provenance metadata
Files Modified: provenance.py, test_tnh_gen.py, tnh-gen.md

OpenAI Client Unification ✅¶

Completed: 2025-12-10
ADR: ADR-A13
What: Migrated from legacy openai_interface/ to modern gen_ai_service/providers/ architecture (6 phases)

Core Stubs Implementation ✅¶

Completed: 2025-12-10
What: Implemented params_policy, model_router, safety_gate, completion_mapper with strong typing
Grade: A- (92/100) - Production ready with minor polish

Documentation Reorganization Phase 1 ✅¶

Completed: 2025-12-05
ADR: ADR-DD01, ADR-DD02
What: Absolute links, MkDocs strict mode, filesystem-driven nav, lychee link checking

Packaging & CI Infrastructure ✅¶

Completed: 2025-11-20
What: pytest in CI, runtime dependencies declared, pre-commit hooks, Makefile targets

Remove Library sys.exit() Calls ✅¶

Completed: 2025-11-15
What: Library code raises ConfigurationError instead of exiting process

Convert Documentation Links to Absolute Paths ✅¶

Completed: 2025-12-05 (PR #14)
What: Converted 964 links to absolute paths, enabled MkDocs strict link validation, integrated link verification

NumberedText Section Boundary Validation ✅¶

Completed: 2025-12-12
ADR: ADR-AT03.2 (status: accepted → should be implemented)
What: Implemented validate_section_boundaries() and get_coverage_report() methods for robust section management
Commits: cf99375 (docs), 798a552 (refactor unused methods)

TextObject Robustness Improvements ✅¶

Completed: 2025-12-14
ADR: ADR-AT03.3 (status: accepted → should be implemented)
What: Implemented merge_metadata() with MergeStrategy enum, validate_sections() with fail-fast, converted to Pydantic v2, added structured exception hierarchy
Commits: 096e528 (implementation), 03654fe (../../docstrings)