ADR-DD02: Documentation Main Content and Navigation Strategy¶
Establishes how README.md and docs/index.md relate, defines content inclusion patterns, and specifies navigation automation with mkdocs-literate-nav.
- Status: Proposed
- Date: 2025-11-23
- Owner: Documentation Working Group
- Supersedes: None
- Related: ADR-DD01: Documentation System Reorganization Strategy
Context¶
Following ADR-DD01's filesystem reorganization and literate-nav adoption (TODO #9, Part 4e complete), we now have:
- README.md: Rich user-facing content with Vision & Goals, Features, Quick Start, Installation, and Documentation Structure overview
- docs/index.md: Auto-generated sparse documentation map—just a flat list of all documents by section
- Navigation: Filesystem-driven via
generate_mkdocs_nav.pyproducingdocs-nav.mdfor mkdocs-literate-nav - Build automation: Scripts for CLI docs generation, doc-index generation, and README sync verification
Current Problems¶
-
Content Divergence: README.md has substantial onboarding content (vision, features, quick start) that doesn't appear in docs/index.md. Newcomers arriving at the published documentation site see only a file listing, not the compelling project introduction.
-
Document Purpose Confusion: Both documents serve entry points but target different contexts:
- README.md: GitHub repository landing page, must be immediately actionable
-
docs/index.md: MkDocs site home, should orient users to the full documentation landscape
-
Maintenance Burden: No clear strategy for what content belongs where, when duplication is acceptable, or how to keep critical sections synchronized.
-
Navigation Clarity: Auto-generated
docs-nav.mdworks well for filesystem traversal but lacks curation—no persona-based entry points, no workflow-oriented groupings, no "start here" guidance. -
Content Reusability: Complex sections (installation steps, development setup, pattern system overview) appear in multiple places with manual duplication and drift risk.
Design Constraints¶
- Filesystem-driven navigation is non-negotiable (ADR-DD01 decision): Documentation structure must mirror
docs/tree with literate-nav auto-generation - README.md must remain editable: Cannot become a pure build artifact—GitHub display requires direct file readability
- CI verification: Must detect when critical content drifts between README and documentation
- Incremental adoption: Solution must work with current tooling (mkdocs-literate-nav, mkdocs-gen-files) without major architectural changes
- Contributor accessibility: Documentation workflow should be obvious to new contributors without deep tooling knowledge
Decision¶
Adopt a Progressive Enhancement content strategy with phase-based implementation:
1. Content Architecture¶
README.md: Concise Project Gateway¶
README.md serves as the GitHub repository landing page and remains hand-maintained. It provides:
- Project description: 2-3 sentence elevator pitch
- Vision & Goals: Why the project exists, what problems it solves (4-6 bullet points)
- Features: High-level capabilities overview (Core Tools summary, Pattern System summary)
- Quick Start: Minimal install + first command (PyPI install, tnh-setup, example usage)
- Documentation Overview: Brief orientation + link to full docs site
- Development: Pointer to DEV_SETUP.md and CONTRIBUTING.md
- Project Status: Current version, alpha/beta/stable designation
- Support & Community: Links to issues, discussions, documentation
README.md stays concise (target: readable in 2-3 scrolls, ~200-250 lines). Detailed content lives in the documentation site.
docs/index.md: Comprehensive Documentation Hub¶
docs/index.md serves as the MkDocs site landing page and becomes the primary onboarding experience. It provides:
- Welcome Section: Same project description as README (synchronized manually or via drift reporting)
- Getting Started (Persona-Based): Curated entry points for different audiences:
- Practitioners: Using the CLI tools for dharma talk processing and translation
- Developers: Contributing code, running tests, understanding architecture
- Researchers: Exploring the knowledge base, evaluation workflows, and experiments
- Key Features: Expanded from README with links to deep-dive documentation
- Installation: Embedded detailed steps (can reference
_includes/installation.mdin Phase 2) - Quick Reference: Common commands, pattern examples, troubleshooting links
- Architecture Overview: High-level system design with links to ADRs
- Contributing: How to participate, testing, documentation contributions
- Documentation Map: Auto-generated section listing (repositioned to bottom, renamed "Complete Documentation Index")
docs/index.md is comprehensive (target: complete orientation for all personas, ~400-500 lines). It's the definitive "start here" page.
Persona-based navigation is a key differentiator from README.md—it helps users self-identify their path and find relevant documentation quickly.
Shared Content Strategy¶
Phase 1 (Current - Simple Independence with Drift Reporting):
- Accept controlled duplication of introductory content (project description, vision, high-level features)
- README and docs/index.md are independently maintained
- Lightweight drift reporting script (check_readme_docs_drift.py) generates informational reports
- Non-blocking: reports written to local log file (docs_sync_report.txt) for review
- Manual sync decisions made during project check-ins based on drift reports
Phase 2 (Planned - Selective Inclusion):
- Extract complex, frequently reused content to docs/_includes/:
- installation.md: Detailed install steps (PyPI, prerequisites, dev setup)
- development.md: Development environment configuration
- prompt-overview.md: Prompt system introduction
- Use mkdocs snippets plugin (--8<-- syntax) for transclusion
- Keep README independent; include shared content in docs/index.md and other docs
Phase 3 (Future - Templated Assembly, if needed): - If maintenance burden grows significantly, consider templating (Jinja2 via mkdocs-macros) - Would allow generating both README and docs/index.md from structured content - Defer this decision until Phase 2 proves insufficient
2. Navigation Strategy¶
Filesystem-Driven Navigation (Literate-Nav)¶
Continue current approach with refinements:
- Auto-generation:
generate_mkdocs_nav.pyproducesdocs-nav.mdfromdocs/tree structure - Front matter titles: Prefer YAML
title:field over filename humanization (must match exactly per markdown standards) - Sort order: Maintain curated top-level order in
TOP_LEVEL_ORDERlist - Index page handling: Directory
index.mdbecomes section landing page with overview + persona-appropriate navigation aids
Navigation file docs-nav.md is a build artifact (regenerated on every docs build).
Section Index Pages¶
Every top-level directory must have an index.md that provides:
- Section purpose: What this documentation covers
- Target audience: Who should read this section
- Navigation aids: Curated list of key documents (complement to auto-generated nav)
- Prerequisites: What to read first (if applicable)
Example structure for docs/architecture/index.md:
---
title: "Architecture"
description: "System design, ADRs, and component deep-dives for TNH Scholar."
---
# Architecture
This section documents the design decisions, system architecture, and
component implementations for TNH Scholar.
## Getting Started
- **New to the codebase?** Start with [System Overview](/architecture/overview.md)
- **Looking for decisions?** Browse [architecture docs](/architecture/overview.md)
- **Need component details?** See subsystem design documents below
## Key Resources
- [GenAI Service](/architecture/gen-ai-service/design/genai-service-design-strategy.md) - Core AI integration layer
- [Prompt System](/architecture/prompt-system/prompt-system-architecture.md) - Prompt management architecture
- [Transcription Pipeline](/architecture/transcription/design/diarization-system-design.md) - Audio processing design
## Subsystems
- [AI Text Processing](/architecture/ai-text-processing/design/textobject-system-design.md) - Text transformation pipeline
- [Knowledge Base](/architecture/knowledge-base/adr/adr-k01-kb-architecture-strategy.md) - Vector search and metadata
- [Transcription](/architecture/transcription/design/diarization-system-design.md) - Audio-to-text with diarization
- [Video Processing](/architecture/video-processing/adr/adr-vp01-video-processing.md) - YouTube integration
Documentation Map Integration¶
The auto-generated "Documentation Map" in docs/index.md (current behavior) is retained but repositioned:
- Move to bottom of docs/index.md (after Welcome, Getting Started (Persona-Based), Features, Installation, Quick Reference, Architecture Overview, Contributing)
- Rename to "Complete Documentation Index"
- Add introductory text: "Browse all documentation organized by topic. For persona-based entry points, see Getting Started above."
This ensures the landing page prioritizes human-oriented navigation (persona paths) over exhaustive file listings (auto-generated map).
3. Content Drift Monitoring (Phase 1 MVP)¶
Lightweight Drift Reporting¶
Instead of enforcing synchronization, Phase 1 uses informational drift reporting:
Script: scripts/check_readme_docs_drift.py
The script compares sections between README.md and docs/index.md by dynamic heading detection:
Approach:
- Extract all
## Section Nameheadings from both files - Compare sections with matching titles (case-sensitive)
- Warn on title mismatches (case, punctuation, spacing differences)
- Report sections that exist in one file but not the other
Behavior:
- Extract all
## Level 2sections from both files - Match sections by heading text (exact match)
- For matched sections: generate unified diff if content differs
- For unmatched sections: report "only in README" or "only in docs/index.md"
- Warn on near-matches (e.g., "Quick Start" vs "Quick start" - case mismatch)
- Write report to
docs_sync_report.txt(gitignored) - Print report to console for CI visibility
- Always exit 0 (non-blocking, informational only)
Report format:
================================================================================
README.md ↔ docs/index.md Drift Report
Generated: 2025-11-23 10:15:32
================================================================================
Matched Sections (compared):
âś“ Vision & Goals: IDENTICAL
âś— Features: DIFFERS
--- README.md::Features
+++ docs/index.md::Features
@@ -1,5 +1,7 @@
TNH Scholar provides several CLI tools:
- audio-transcribe: Process audio files
+- tnh-fab: Text processing with patterns
- ytt-fetch: Download YouTube transcripts
Sections only in README.md:
- Example Usage
- Development
Sections only in docs/index.md:
- Architecture Overview
- Contributing
- Complete Documentation Index
Title Mismatches (possible typos):
âš "Quick Start" (README) vs "Quick start" (docs/index.md) - case mismatch
âš DRIFT DETECTED - Review differences above
This is informational only - no action required unless intentional divergence.
================================================================================
Integration Points¶
Makefile target:
.PHONY: docs-drift
docs-drift:
@poetry run python scripts/check_readme_docs_drift.py
.PHONY: docs-verify
docs-verify: docs-drift
@poetry run mkdocs build --strict
CI workflow (non-blocking):
- name: Check documentation drift
run: |
python scripts/check_readme_docs_drift.py
cat docs_sync_report.txt
continue-on-error: true
Gitignore:
Usage Workflow¶
- During development: Run
make docs-driftto see current drift status - Before commits: Review
docs_sync_report.txtlocally - In CI: Drift report printed in logs (visible but doesn't fail build)
- At project check-ins: Review accumulated drift, decide if manual sync warranted
Acceptable Divergence¶
The following divergences are expected and acceptable:
- Depth: README gives high-level overview; docs/index.md provides detailed explanation
- Audience: README targets newcomers; docs/index.md serves all personas
- Navigation: README links to docs site; docs/index.md embeds navigation aids
- Examples: README shows minimal quick-start; docs/index.md includes comprehensive examples
No sync enforcement - teams decide when alignment matters based on drift reports.
4. Content Inclusion Patterns (Future)¶
Status: Detailed design deferred to ADR-DD03: Content Reuse and Inclusion Strategies
Phase 1 accepts controlled duplication with drift monitoring. When duplication becomes burdensome, ADR-DD03 will define:
- Inclusion hierarchy: Markdown snippets (
docs/_includes/), macros (docs/_templates/), and generation scripts - Tooling choices: mkdocs snippets plugin, mkdocs-macros, mkdocs-gen-files integration
- Naming conventions: Underscore-prefixed directories for non-user-facing content
- Migration strategy: Moving duplicated content to shared locations
Current approach (Phase 1): Use generation scripts (Level 3) only for already-automated content (CLI docs, API reference, documentation index). No manual content inclusion yet.
5. Maintenance Workflows¶
Updating README.md¶
- Edit README.md directly in repository root
- Run
make docs-driftto see if changes affect monitored sections - Review
docs_sync_report.txtto assess drift - Decide if docs/index.md should be updated (no enforcement)
Updating docs/index.md¶
- Edit docs/index.md directly in
docs/ - Run
make docs-driftto check for drift in monitored sections - Review report and update README.md if appropriate
- CI runs drift check but doesn't fail on drift (informational only)
Adding New Documentation¶
- Create markdown file in appropriate
docs/subdirectory - Add YAML front matter (title, description, owner, author, status)
- Navigation updates automatically via literate-nav
- If creating new top-level section, add to
TOP_LEVEL_ORDERingenerate_mkdocs_nav.py - Create section
index.mdwith overview and navigation aids
Reorganizing Content¶
- Move files using
git mvto preserve history - Update internal links (CI link checker will catch broken references)
- Navigation regenerates automatically
- Update section index pages if section purpose changes
- Run
make docsto rebuild and verify
6. Implementation Phases¶
Phase 1: Simple Independence with Drift Reporting (Current Priority)¶
Scope: Complete TODO #9, Part 3b
-
Enhance docs/index.md with persona-based Getting Started - Implement
check_readme_docs_drift.pyscript - Add
docs-drifttarget to Makefile - Add
docs_sync_report.txtto.gitignore - Integrate drift reporting into CI (non-blocking)
- Create/update section index pages for all top-level directories
- Reposition auto-generated Documentation Map in docs/index.md
Success Criteria: - docs/index.md provides comprehensive onboarding (400-500 lines) - README.md stays concise (200-250 lines) - Drift reporting runs in CI and generates local reports - No CI failures from drift (informational only) - All top-level sections have index.md with navigation aids
Phase 2: Content Inclusion Patterns (Future)¶
Status: Deferred to ADR-DD03: Content Reuse and Inclusion Strategies
Trigger: When 10+ instances of duplicated complex content (installation steps, development setup, etc.) cause maintenance burden.
Approach Sketch:
- Use mkdocs snippets plugin (
--8<--syntax) for shared content indocs/_includes/ - Keep README.md independent; include shared sections in docs/index.md and other docs
- Document inclusion patterns in markdown standards
Decision Point: Revisit 6 months post-beta or when drift reporting shows repeated manual syncs of identical content.
Phase 3: Advanced Automation (Future, If Needed)¶
Status: Deferred to ADR-DD04: Documentation Generation and Templating (if Phase 2 proves insufficient)
Trigger: High-churn content causing frequent drift despite inclusion patterns.
Approach Sketch:
- Templated assembly using Jinja2 via mkdocs-macros
- Structured content storage (YAML/JSON) for frequently changing sections
- Generate both README.md and docs/index.md from templates
Decision Point: Only proceed if clear ROI demonstrated (e.g., weekly README/docs updates causing sync overhead).
Alternatives Considered¶
-
Templated Assembly (Full Jinja2): Rejected for Phase 1 due to build complexity and README becoming non-editable artifact. Deferred to Phase 3 if needed.
-
README as Build Artifact: Rejected because GitHub requires readable README in repository view. Generated files hurt discoverability.
-
Single Unified Document: Rejected because README and docs/index serve different contexts (GitHub vs. MkDocs site) and need different levels of detail.
-
Manual Duplication Without Any Monitoring: Rejected due to drift risk observed in current state (README and docs/index.md had diverged significantly). Lightweight drift reporting provides awareness without enforcement overhead.
-
Shared Content Directory (No Templating): Considered for Phase 1 but deferred to Phase 2. Initial implementation keeps documents independent with drift reporting only.
-
Enforced Synchronization with Markers: Rejected for Phase 1 due to added complexity (marker management, CI failures on drift). Phase 1 prioritizes simplicity—drift reporting provides awareness without enforcement burden. Can revisit in Phase 2 if drift proves problematic.
Consequences¶
Positive¶
- Clear ownership: README and docs/index.md have distinct purposes and audiences
- Flexibility: Each document optimized for its context (GitHub vs. MkDocs site)
- Safety net: Sync verification prevents critical content drift while allowing intentional divergence
- Incremental adoption: Start simple (independent docs), add inclusion patterns only when needed
- Contributor clarity: Obvious where content lives, when to update both files, how to verify
- Navigation automation: Literate-nav keeps structure in sync with filesystem
- Onboarding improvement: docs/index.md becomes comprehensive entry point (not just file list)
Negative / Risks¶
- Controlled duplication: Accept some redundancy between README intro and docs/index intro (mitigated by drift reporting for awareness)
- Manual sync decisions: Teams must review drift reports and decide when to sync (no enforcement)
- Drift accumulation: Without enforcement, documents could diverge significantly over time (mitigated by regular project check-in reviews)
- Tooling dependency: Relies on mkdocs-literate-nav, mkdocs-gen-files, and custom scripts (acceptable given ADR-DD01 commitment)
- Phase 2 transition: Moving to inclusion patterns requires coordination (defer until post-beta to minimize churn)
Mitigation Strategies¶
- Document the workflow: Clear instructions in
docs/docs-ops/markdown-standards.mdand CONTRIBUTING.md - Regular reviews: Include drift report in project check-in process
- Minimal monitoring surface: Only track high-level sections (Vision, Features, Quick Start, Installation)
- Defer complexity: Don't adopt inclusion patterns or sync enforcement until duplication causes real pain
- Template enforcement: ADR template and standards prevent structural inconsistency
Open Questions & Future Decisions¶
-
Sync enforcement: Should we enforce synchronization in CI or keep reporting-only? Decision: Start with reporting-only in Phase 1; add enforcement in Phase 2 only if drift becomes problematic.
-
Documentation Map position: Bottom of docs/index.md or separate page? Decision: Keep at bottom initially; move to dedicated
/documentation-indexpage if docs/index.md exceeds 600 lines. -
Persona-based navigation: Should literate-nav group docs by persona (User/Developer/Researcher) in addition to topic? Decision: Defer to Phase 2; current topic-based structure sufficient for ~100 docs.
-
Section index automation: Can section index pages be partially auto-generated? Decision: Keep hand-maintained for curation; auto-generation risks losing narrative flow.
-
Link checking: Should CI verify all internal markdown links? Decision: Yes, add to TODO #9, Part 4d (link normalization task).
Approval & Tracking¶
- TODO Reference: TODO #9 (Documentation Reorganization, ADR-DD01), Part 3b
- Implementation Tracking: GitHub issues tagged
docs+adr-dd02 - Related ADRs:
- ADR-DD01: Documentation System Reorganization Strategy (accepted)
- ADR-DD03: Content Reuse and Inclusion Strategies (future - Phase 2)
- ADR-DD04: Documentation Generation and Templating (future - Phase 3, if needed)
- Review Cycle: Reassess Phase 2 transition 6 months post-beta or when 10+ instances of complex content duplication observed
Approval of this ADR completes the content architecture design for TODO #9, Part 3b, and provides a roadmap for incremental documentation improvements through beta and beyond.