Skip to content

TNH Scholar TODO List

Roadmap tracking the highest-priority TNH Scholar tasks and release blockers.

Last Updated: 2025-12-12 Version: 0.2.2 (Alpha) Status: Active Development - ADR-AT03 Implementation Phase


Priority Roadmap

This section organizes work into three priority levels based on criticality for production readiness.

Priority 1: Critical Path to Beta

Goal: Remove blockers to production readiness. These items must be completed before beta release.

Status: ⅘ Complete βœ…

1. βœ… Add pytest to CI

  • Status: COMPLETED
  • Location: .github/workflows/ci.yml
  • What: Tests now run in CI with coverage reporting
  • Command: pytest --maxfail=1 --cov=tnh_scholar --cov-report=term-missing

2. βœ… Fix Packaging Issues

  • Status: COMPLETED
  • Location: pyproject.toml
  • What:
  • βœ… Runtime dependencies declared (pydantic-settings, python-json-logger, tenacity)
  • βœ… Python version pinned to 3.12.4
  • ⚠️ Pattern directory import issue still pending (see Configuration & Data Layout below)

3. βœ… Remove Library sys.exit() Calls

4. 🚧 Implement Core Stubs

  • Status: PRELIMINARY IMPLEMENTATION COMPLETE βœ… - Needs Polish & Registry Integration
  • Priority: HIGH
  • Review: Code review completed 2025-12-10 - Grade: A- (92/100) ⭐⭐⭐⭐⭐
  • Core Implementation:
  • params_policy.py β€” Policy precedence implemented βœ…
    • βœ… Policy precedence: call hint β†’ prompt metadata β†’ defaults
    • βœ… Settings cached via @lru_cache (excellent optimization)
    • βœ… Strong typing with ResolvedParams Pydantic model
    • βœ… Routing diagnostics in routing_reason field
    • Score: 95/100 - Excellent implementation
  • model_router.py β€” Capability-based routing implemented βœ…
    • βœ… Declarative routing table with _MODEL_CAPABILITIES
    • βœ… Structured output fallback (JSON mode capability switching)
    • βœ… Intent-aware architecture foundation
    • ⚠️ Intent routing currently placeholder (line 98-101)
    • Score: 92/100 - Strong implementation
  • safety_gate.py β€” Three-layer safety checks implemented βœ…
    • βœ… Character limit, context window, budget estimation
    • βœ… Typed exceptions (SafetyBlocked)
    • βœ… Structured SafetyReport with actionable diagnostics
    • βœ… Content type handling (string/list with warnings)
    • βœ… Prompt metadata integration (safety_level)
    • ⚠️ Price constant hardcoded (line 30: _PRICE_PER_1K_TOKENS = 0.005)
    • ⚠️ Post-check currently stubbed
    • Score: 94/100 - Excellent implementation
  • completion_mapper.py β€” Bi-directional mapping implemented βœ…

    • βœ… Clean transport β†’ domain transformation
    • βœ… Error details surfaced in policy_applied
    • βœ… Status handling (OK/FAILED/INCOMPLETE)
    • βœ… Pure mapper functions (no side effects)
    • ⚠️ policy_applied uses Dict[str, object] (should be more specific)
    • Score: 91/100 - Strong implementation
  • High Priority (Before Merging):

  • Add Google-style docstrings to public functions (see style-guide.md)
    • apply_policy(), select_provider_and_model(), pre_check(), post_check(), provider_to_completion()
  • Move _PRICE_PER_1K_TOKENS constant to Settings or registry (blocks ADR-A14)
    • Moved to Settings.price_per_1k_tokens; safety gate now consumes setting.
  • Type tightening in completion_mapper

    • Added PolicyApplied alias (dict[str, str | int | float]).
  • Medium Priority (V1 Completion):

  • Promote policy_applied typing to a shared domain type (CompletionEnvelope) to avoid loose dict usage across the service.

  • Capability registry extraction (β†’ ADR-A14)

    • Create runtime_assets/registries/providers/openai.jsonc
    • Implement RegistryLoader with JSONC support
    • Refactor model_router.py to use registry
    • Refactor safety_gate.py to use registry pricing
    • See: ADR-A14: File-Based Registry System
  • Intent routing implementation
  • Post-check safety implementation

  • Low Priority (Future Work):

  • Warning enum system
    • Create typed warning codes instead of strings
    • Affects: safety_gate, completion_mapper, model_router
  • Enhanced diagnostics
    • More granular routing reasons
    • Detailed safety check diagnostics
  • Message.content Type Architecture Investigation (design quality, non-blocking)

    • Location: gen_ai_service/models/domain.py:92-96
    • Issue: Sourcery identifies Union[str, List[ChatCompletionContentPartParam]] as source of complexity
    • Context: Current design intentionally supports OpenAI's flexible content API (plain text OR structured parts with images/etc)
    • Investigation Areas:
    • Document current usage patterns across codebase
    • Assess downstream complexity: where are type checks needed?
    • Evaluate normalization strategies (always list? separate fields? utility methods?)
    • Consider provider compatibility (Anthropic, etc)
    • Draft ADR or addendum to existing GenAI ADRs if design change warranted
    • Impact: Affects message representation throughout GenAIService
  • Review Summary:

  • Strengths: Excellent architectural alignment, strong typing, proper separation of concerns, clean integration
  • Minor Issues: Missing function docstrings, hardcoded price constant, one dict type needing refinement
  • Overall: Production-ready with minor polish (estimated 1 hour total)
  • Detailed Review: See code review session 2025-12-10

5. βœ… Unify OpenAI Clients

  • Status: COMPLETED βœ…
  • Priority: HIGH
  • ADR: ADR-A13: Legacy Client Migration
  • Plan: Migration Plan
  • What: Unified OpenAI client implementations by migrating from legacy to modern architecture
  • Modern: gen_ai_service/providers/openai_client.py - typed, retrying
  • Legacy: openai_interface/ – removed as of Phase 6
  • Phase 1: Utilities & Adapters βœ… COMPLETE
  • Create token_utils.py - token counting
  • Create response_utils.py - response extraction
  • Create simple_completion.py - migration adapter
  • Add comprehensive tests (33 new tests)
  • Fix hard-coded literals (use policy dataclass)
  • Phase 2-6: Migration βœ… COMPLETE
  • Phase 2: Migrate core modules (ai_text_processing, journal_processing)
  • Phase 3: Migrate CLI tools
  • Phase 4: Migrate tests
  • Phase 5: Update notebooks
  • Phase 6: Delete legacy code (openai_interface/)

Priority 2: Beta Quality

Goal: Improve maintainability, user experience, and code quality for beta release.

6. 🚧 Expand Test Coverage

  • Status: NOT STARTED
  • Current Coverage: ~5% (4 test modules)
  • Target: 50%+ for gen_ai_service
  • Tasks:
  • GenAI service flows: prompt rendering, policy resolution, provider adapters
  • CLI integration tests (option parsing, environment validation)
  • Configuration loading edge cases
  • Error handling scenarios
  • Pattern catalog validation

7. 🚧 Consolidate Environment Loading

8. 🚧 Clean Up CLI Tool Versions

  • Status: PARTIAL (old versions removed, utilities pending)
  • Location: cli_tools/audio_transcribe/
  • Tasks:
  • Remove audio_transcribe0.py
  • Remove audio_transcribe1.py
  • Remove audio_transcribe2.py
  • Keep only current version
  • Create shared utilities (argument parsing, environment validation, logging)

9. βœ… Documentation Reorganization (ADR-DD01 & ADR-DD02)

  • Status: PHASE 1 COMPLETE βœ… (Parts 1–4 βœ… COMPLETE, Part 8 βœ… COMPLETE, File Reorganization βœ… COMPLETE; Parts 5–7 deferred to Phase 2)
  • Reference:
  • ADR-DD01: Docs Reorganization Strategy
  • ADR-DD02: Documentation Main Content and Navigation Strategy βœ… APPROVED
  • Goal: Execute the phased documentation overhaul for docs/ tree, keep README β‰ˆ docs/index with drift monitoring, automate verification. Note: patterns/ directory is managed separately (TODO #16).
  • Next Sequence: Part 5 (Archive) β†’ Part 6 (Gap Filling) β†’ Part 7 (Standalone Tasks)
  • Checkpoints / Tasks:
  • Inventory + Tagging
    • Catalog every Markdown file (owner, status: current/needs-update/historical)
    • Add front matter metadata + PromptTemplate terminology notes
    • Identify raw research assets to offload to external storage
  • Filesystem Reorg (βœ… COMPLETE)
    • Create the target hierarchy (overview, getting-started, user-guide, cli-reference, prompt-templates, api-reference, architecture/adr, development, research, docs-ops, archive)
    • Move existing docs into the new layout with stub index.md files
    • Rename all architecture documents for clarity and consistency (ADR naming, design doc naming)
    • Create README.md files for major sections (architecture/, cli/, development/, getting-started/)
    • Remove obsolete CLI reference stubs (auto-generation removed, see TODO #17)
    • Reorganize reference materials into categorized subdirectories
    • Tag archival folders explicitly for mkdocs-literate-nav auto-generation (deferred to Phase 2)
  • Terminology + README Sweep (Part 3b: βœ… COMPLETED - ADR-DD02 + ADR-DD03)
    • 3b (COMPLETED): Designed content architecture for README.md and docs/index.md (ADR-DD02)
    • Implemented drift reporting script (check_readme_docs_drift.py) for non-blocking sync monitoring
    • Established persona-based navigation strategy (Practitioners, Developers, Researchers)
    • Updated markdown standards to enforce exact YAML title ↔ heading match
    • Pattern β†’ Prompt terminology standardization (ADR-DD03 Phase 1 βœ… COMPLETE)
    • Updated all user-facing documentation (README, docs/index.md, getting-started/, user-guide/)
    • Renamed patterns.md β†’ prompts.md; pattern-system/ β†’ prompt-system/
    • Added historical terminology note to docs/index.md
    • Retained legacy compatibility: TNH_PATTERN_DIR, --pattern flags
    • Phase 2: CLI documentation updates (deferred post-merge, many tools deprecated)
    • Phase 3: Code refactoring (tracked separately, many modules scheduled for deletion)
    • Add prompt authoring schema guidance (deferred to Part 6)
  • MkDocs + Automation (βœ… ALL PARTS COMPLETE)
    • Install mkdocs-literate-nav and mkdocs-gen-files to dev dependencies
    • Restructure mkdocs.yaml to remove hardcoded nav and use literate-nav plugin
    • Create docs/nav.md as the source-of-truth navigation hierarchy
    • Configure gen-files to auto-generate CLI docs and prompt template catalogs
    • Add doc-index automation (scripts/generate_doc_index.py) and flag generated outputs
    • 4b (COMPLETED): Add doc-generation scripts (generate_cli_docs.py, sync_readme.py) and Makefile docs targets
    • 4c (COMPLETED): Wire CI to run mkdocs build + doc verification + GitHub Pages deployment
    • Add markdownlint to CI/CD (MD025/MD013 ignored via .markdownlint.json)
    • 4d (COMPLETED): Normalize internal documentation links; refactor doc-index generation to single docs/documentation_index.md with relative links
    • 4e (COMPLETED): Enable filesystem-driven nav with mkdocs-literate-nav
    • 4f (COMPLETED - ADR-DD02): Add drift reporting (check_readme_docs_drift.py) with Makefile target and CI integration
    • 4g (PHASE 1 COMPLETE): Documentation testing and validation workflow
    • Phase 1: Quick Wins βœ… COMPLETE
      • Enable mkdocs build --strict in docs-verify (fail on warnings)
      • Add link checking with lychee + .lychee.toml (ignore flaky/external as needed)
      • Add codespell with .codespell-ignore.txt (dharma terms/proper nouns); wire into pre-commit/CI
      • Create docs-quickcheck make target: sync_root_docs β†’ mkdocs --strict β†’ lychee β†’ codespell
      • Fixed all 136 MkDocs strict mode warnings (autorefs, griffe type annotations)
    • Phase 2: Metadata Validation (Beta gate)
      • Add scripts/check_doc_metadata.py to validate front matter (title/description/status) and warn on empty descriptions
      • Detect orphaned docs not reachable from nav (using generated nav) and report missing descriptions
      • Add metadata check to pre-commit and CI
    • Phase 3: Coverage & Structure (Prod polish)
      • Add interrogate for Python docstring coverage (threshold on src/tnh_scholar, skip tests/scripts)
      • Validate ADRs follow template sections (Context/Decision/Consequences) + required front matter
      • Run offline/internal link check on built site (lychee --offline on site/)
      • Optional: add vale with a minimal style guide for docs tone/consistency
  • Historical Archive + Discoverability (Phase 2)
    • Archived historical research artifacts and experiment files
    • Move additional legacy ADRs/prototypes into docs/archive/**
    • Create comprehensive archive index + add summary links from primary sections
    • Host raw transcripts externally (S3/KB) and link from summaries
  • Backlog + Gap Filling
    • Populate docs/docs-ops/roadmap.md with missing topics (PromptTemplate catalog, workflow playbooks, evaluation guides, KB, deployment, research summaries, doc ops)
    • Open GitHub issues per backlog item with owners/priorities
  • Documentation Structure Reorganization (βœ… COMPLETE - Python Community Standards)
    • Split design-guide.md into Python standard docs:
    • style-guide.md: Code formatting, naming, PEP 8, type annotations
    • design-principles.md: Architectural patterns, modularity, composition
    • Move object-service architecture to canonical location:
    • Moved from development/architecture/ to architecture/object-service/
    • Converted V2 blueprint to ADR-OS01 (adopted V3, deleted V1)
    • Created design-overview.md with high-level summary
    • Updated implementation-status.md with resolved items
    • Create forward-looking prompt architecture:
    • Created prompt-architecture.md (current + planned V2 with PromptCatalog)
    • Moved pattern-core-design.md to archive/ with terminology note
    • Documented VS Code integration requirements
    • Fix all broken links from reorganization:
    • Fixed 35 mkdocs build --strict warnings β†’ 0 warnings βœ…
    • Updated docs/index.md, contributing.md, ADR cross-references
    • Regenerated documentation_index.md
    • Established Python community standard structure:
    • docs/architecture/ = ADRs, design decisions (the "why")
    • docs/development/ = Developer guides (the "how")
    • docs/project/ = Vision, philosophy (stakeholders)
  • Outstanding Standalone Tasks (Phase 2 - Future Work)
    • Created architecture/README.md overview
    • Deprecate outdated CLI examples (deferred post-CLI-refactor, see TODO #17)
    • Add practical user guides for new features post-reorg
    • Expand architecture overview with component diagrams
    • Establish research artifact archival workflow (external storage + summary linking)
  • Include Root Markdown Files in MkDocs Navigation
    • Status: βœ… COMPLETE
    • Priority: MEDIUM (Part of docs-reorg cleanup)
    • Goal: Make root-level config/meta files (README, TODO, CHANGELOG, CONTRIBUTING, DEV_SETUP, release_checklist) discoverable in mkdocs navigation and documentation index
    • Approach: Build-time copy with "DO NOT EDIT" warnings
    • Create docs/project/repo-root/ directory for project meta-documentation
    • Create scripts/sync_root_docs.py to copy root markdown files
    • Copy root .md files (README, TODO, CHANGELOG, CONTRIBUTING, DEV_SETUP, release_checklist) to docs/project/repo-root/
    • Prepend HTML comment warning to each copied file
    • Update Makefile docs target to run sync script before mkdocs build
    • Test documentation build: make docs
    • Verify copied files appear in navigation and documentation index
    • Create docs/project/index.md with section overview
    • Wire into gen-files plugin for automatic sync on build

10. 🚧 Type System Improvements

  • Status: PARTIAL (see detailed section below)
  • Current: 58 errors across 16 files
  • Tasks: See Type System Improvements section below

Priority 3: Production Readiness

Goal: Long-term sustainability, advanced features, and production hardening.

11. 🚧 Refactor Monolithic Modules

13. 🚧 Complete Provider Abstraction

  • Status: NOT STARTED
  • Tasks:
  • Implement Anthropic adapter
  • Add provider-specific error handling
  • Test fallback/retry across providers
  • Provider capability discovery
  • Multi-provider cost optimization

14. 🚧 Knowledge Base Implementation

15. 🚧 Developer Experience Improvements

  • Status: PARTIAL (hooks and Makefile exist, automation pending)
  • Tasks:
  • Add pre-commit hooks (Ruff, notebook prep)
  • Create Makefile for common tasks (lint, test, docs, format, setup)
  • Add MyPy to pre-commit hooks
  • Add contribution templates (issue/PR templates)
  • CONTRIBUTING.md exists and documented
  • Release automation
  • Changelog automation

16. 🚧 Configuration & Data Layout

  • Status: NOT STARTED
  • Priority: HIGH (blocks pip install)
  • Problem: src/tnh_scholar/init.py raises FileNotFoundError when repo layout missing
  • Tasks:
  • Package pattern assets as resources
  • Make patterns directory optional
  • Move directory checks to CLI entry points only
  • Ensure installed wheels work without patterns/ directory

17. 🚧 Prompt Catalog Safety

  • Status: NOT STARTED
  • Priority: MEDIUM
  • Problem: Adapter doesn't handle missing keys or invalid front-matter gracefully
  • Tasks:
  • Add manifest validation
  • Implement caching
  • Better error messages (unknown prompt, hash mismatch)
  • Front-matter validation
  • Document pattern schema

Type System Improvements

Current Status:

  • Total Type Errors: 58
  • Affected Files: 16
  • Files Checked: 62

High Priority (Pre-Beta)

Install Missing Type Stubs βœ… COMPLETED

  • Install required type stub packages:
  • types-PyYAML
  • types-requests

Critical Type Errors

  • Fix audio processing boundary type inconsistencies
  • Resolve return type mismatches in audio_processing/audio.py
  • Standardize Boundary type usage
  • Fix core text processing type errors
  • Fix str vs list[str] return type in bracket.py
  • Resolve object extension error in video_processing.py
  • Address function redefinitions in run_oa_batch_jobs.py:
  • Resolve calculate_enqueued_tokens redefinition
  • Fix process_batch_files redefinition
  • Fix main function redefinition

Medium Priority (Beta Stage)

Add Missing Type Annotations

  • Add variable type annotations:
  • attributes_with_values in clean_parse_tag.py
  • current_page in xml_processing.py
  • covered_lines in ai_text_processing.py
  • seen_names in patterns.py

Pattern System Type Improvements

  • Fix Pattern class type issues:
  • Resolve apply_template attribute errors
  • Fix name attribute access issues
  • Standardize Pattern type definition

Low Priority (Post-Beta)

General Type Improvements

  • Clean up Any return types:
  • Properly type getch handling in user_io_utils.py
  • Type language code returns in lang.py
  • Remove Any returns in ai_text_processing.py
  • Standardize type usage:
  • Implement consistent string formatting in patterns.py
  • Update callable type usage
  • Clean up type hints in openai_interface.py

Implementation Strategy

Phase 1: Core Type Safety

  • Focus on high-priority items affecting core functionality
  • Implement type checking in CI pipeline
  • Document type decisions

Phase 2: Beta Preparation

  • Address medium-priority items
  • Set up pre-commit type checking hooks
  • Update documentation with type information

Phase 3: Post-Beta Cleanup

  • Handle low-priority type improvements
  • Implement stricter type checking settings
  • Full type coverage audit

Typing Guidelines

Standards:

  • Use explicit types over Any
  • Create type aliases for complex types
  • Document typing decisions
  • Implement consistent Optional handling

References:


Additional Tasks

Medium Priority

βœ… Improve NumberedText Ergonomics

Logging System Scope

  • Location: logging_config.py
  • Problem: Modules call setup_logging individually
  • Tasks:
  • Define single application bootstrap
  • Document logger acquisition pattern (get_logger only)
  • Create shared CLI bootstrap helper

Low Priority

Package API Definition

  • Status: Deferred during prototyping
  • Tasks:
  • Review and document all intended public exports
  • Implement __all__ in key __init__.py files
  • Verify exports match documentation

Repo Hygiene

  • Problem: Generated artifacts in repo
  • Files: build/, dist/, site/, current_pip_freeze.txt, mypy_errors.txt, project_directory_tree.txt
  • Tasks:
  • Add to .gitignore
  • Document regeneration process
  • Rely on release pipelines for builds

Notebook & Research Management

  • Location: notebooks/, docs/research/
  • Problem: Valuable but not curated exploratory work
  • Tasks:
  • Adopt naming/linting convention
  • Keep reproducible notebooks in notebooks/experiments
  • Publish vetted analyses to docs/research via nbconvert
  • Archive obsolete notebooks

17. 🚧 Comprehensive CLI Reference Documentation

  • Status: NOT STARTED (deferred post-CLI-refactor)
  • Priority: MEDIUM
  • Context: Removed auto-generated CLI reference stubs (2025-12-03). Renamed docs/cli/ to docs/cli-reference/ to reflect reference-style content. CLI structure scheduled for overhaul.
  • Blocked By: CLI tool consolidation (TODO #8)
  • Tasks:
  • Review final CLI structure after refactor
  • Create comprehensive CLI reference using actual --help output at all command levels
  • Generate structured documentation for each command:
    • Command purpose and use cases
    • Full option/argument reference
    • Usage examples
    • Common workflows
  • Automate CLI reference generation in scripts/generate_cli_docs.py
  • Integrate with MkDocs build process
  • Enhance existing docs/cli-reference/ structure with comprehensive reference material
  • Notes:
  • Previously had placeholder stubs with minimal content
  • Current docs/cli-reference/ contains hand-written per-command reference pages
  • Requires examining actual CLI code structure for comprehensive coverage
  • Should align with user guide examples
  • Status: COMPLETED (PR #14, commit 85ec6b0)
  • Priority: MEDIUM
  • Context: Enabled MkDocs 1.6+ absolute link validation (2025-12-04). Absolute links (/path/to/file.md) are clearer, easier to maintain, and automation-friendly compared to relative links (../../../path/to/file.md).
  • Reference: ADR-DD01 Addendum 2025-12-04: Absolute Link Strategy
  • Completed: 2025-12-05
  • Tasks:
  • Audit all Markdown files for relative internal links (120 links found across 25 files)
  • Convert relative links to absolute paths (e.g., ../../../cli-reference/overview.md β†’ /cli-reference/overview.md)
  • Update documentation generation scripts to emit absolute links (created scripts/convert_relative_links.py)
  • Verify all links resolve correctly with mkdocs build --strict (passed)
  • Run link checker to validate changes (verified with scripts/verify_doc_links.py)
  • Update markdown standards documentation to mandate absolute links
  • Add Makefile targets for link verification (docs-links, docs-links-apply)
  • Results:
  • 964 absolute links now in use across 96 markdown files
  • All internal documentation links use absolute paths
  • MkDocs configured with validation.links.absolute_links: relative_to_docs
  • Link verification integrated into docs build process

19. 🚧 Document Success Cases

  • Status: NOT STARTED
  • Priority: MEDIUM
  • Goal: Create comprehensive documentation of TNH Scholar's successful real-world applications
  • Context: Cleanly document proven use cases to demonstrate project value and guide future development
  • Success Cases to Document:
  • Deer Park Monastery Cooking Course
    • Generating translated SRTs for video recordings
    • Diarization implementation
    • SRT generation workflow
  • 1950s JVB (Journal of Vietnamese Buddhism) Translation
    • OCR work on Thay's 1950s editorial work
    • Proof-of-concept translations
    • Historical document processing pipeline
  • Dharma Talk Transcriptions
    • Generating polished standalone XML documents from recordings
    • Transcription to structured format workflow
  • Sr. Dang Nhiem's Dharma Talks
    • Clean transcription work using audio-transcribe and related tools
    • Audio processing pipeline
  • Tasks:
  • Create docs/case-studies/ directory structure
  • Document each success case with:
    • Project context and goals
    • Tools and workflows used
    • Technical challenges and solutions
    • Results and outcomes
    • Lessons learned
  • Add references to relevant code, prompts, and configuration
  • Include sample outputs where appropriate
  • Link from main documentation and README

20. 🚧 Notebook System Overhaul

  • Status: NOT STARTED
  • Priority: HIGH
  • Goal: Transform notebook collection from exploratory/testing to production-quality examples and convert testing notebooks to proper test cases
  • Context: Current notebooks include valuable work but mix exploration, testing, and examples without clear organization
  • Tasks:
  • Audit & Categorize:
    • Inventory all notebooks with purpose classification
    • Identify core example notebooks (referencing success cases from TODO #19)
    • Identify testing notebooks to convert to pytest
    • Identify legacy/archival notebooks
  • Core Example Notebooks (keep and polish):
    • Fully annotate with current code
    • Ensure working with latest codebase
    • Add clear documentation headers
    • Reference relevant success cases
    • Add to docs as working examples
  • Testing Notebooks β†’ Pytest Migration:
    • Convert notebook-based tests to standard pytest test cases
    • Ensure pytest coverage for all testing scenarios
    • Remove testing notebooks after conversion
    • Update test documentation
  • Legacy/Archival Notebooks:
    • Mark clearly as legacy/archival
    • Add context notes for understanding past work
    • Move to notebooks/archive/ or similar
    • Document their historical purpose
  • Documentation Updates:
    • Update notebook documentation structure
    • Create notebook usage guide
    • Link core examples from user guide
    • Document notebook development workflow
  • ADR Decision: May require architecture decision record for notebook management strategy

Progress Summary

Recently Completed (as of 2025-12-09):

  • βœ… Packaging & dependencies fixed
  • βœ… CI pytest integration
  • βœ… Library exception handling (removed sys.exit)
  • βœ… OpenAI client unification (all 6 phases complete)
  • βœ… Documentation reorganization (Phase 1 complete)
  • βœ… Pre-commit hooks and Makefile setup
  • βœ… Documentation links converted to absolute paths (TODO #18)

Current Sprint Focus:

  • 🎯 Implement core stubs (policy, routing, safety)
  • 🎯 Expand test coverage to 50%+
  • 🎯 Type system improvements (58 errors to resolve)

Beta Blockers:

  • Configuration & data layout (pattern directory)
  • Core stub implementations (params_policy, model_router, safety_gate)
  • Test coverage expansion

Notes for Maintainers

Test Running

# Run all tests with coverage
poetry run pytest --maxfail=1 --cov=tnh_scholar --cov-report=term-missing -v

# Run specific test file
poetry run pytest tests/gen_ai_service/test_service.py -v

# Run with coverage report
poetry run pytest --cov=tnh_scholar --cov-report=html

Type Checking

# Check types
poetry run mypy src/

# Generate error report
poetry run mypy src/ > mypy_errors.txt

Code Quality

# Format code
poetry run black src/ tests/

# Lint
poetry run ruff check src/

# Run all checks (as CI does)
poetry run black --check src/
poetry run mypy src/
poetry run ruff check src/
poetry run pytest --maxfail=1 --cov=tnh_scholar