TNH Scholar¶
TNH Scholar is intended to support a community-aligned, open-source, multilingual digital ecosystem for studying, translating, and engaging with the teachings of Thích Nhất Hạnh and the Plum Village Community of Engaged Buddhism.
This document contains deeper onboarding and architectural context. For a more concise intro to the project, see the README.
Vision & Aspirations¶
TNH Scholar is intended as a long-term effort to support the living Plum Village tradition with trustworthy, transparent digital tools. This work is intended, both in development and usage, to deeply respect the tradition and practice of Thích Nhất Hạnh and the Plum Village community.
- Support the building of multilingual corpora spanning Thích Nhất Hạnh's teachings, wider Plum Village teachings, and freely available source Buddhist texts, with high-fidelity text, rich metadata, and sentence-level alignment across languages.
- Provide AI-assisted research tools that expose their reasoning and keep human judgment central, serving monastics, practitioners, teachers, and researchers.
- Support cross-lingual research with support for Vietnamese, English, French, Chinese, Pāli, Sanskrit, Tibetan, and other sources.
- Enable rich interactive environments like bilingual readers combining scans, text, translations, and audio.
- Enable human-supervised AI workflows for corpus processing, translation, and evaluation.
- Engage with current Plum Village dharma teachers as the living continuation of the tradition, extending the corpus beyond Thây's legacy to the broader teaching community.
- Reach toward historical canonical sources — Pāli suttas, Tibetan texts, Sanskrit originals — that form the deeper roots from which the modern teachings grow.
This work is envisioned on a multi-year to multi-decade timescale. The CLI tools and GenAI Service in this repository are the early infrastructure for that larger arc.
For the full vision, including scope, non-scope, relationship to spin-offs, and time horizon, see:
Note on Terminology: Earlier versions of TNH Scholar referred to engineered AI prompts as "Patterns" to emphasize their engineering pattern nature. Current documentation uses "Prompt" to align with industry standards. References to "Pattern" in legacy documentation should be read as "Prompt".
What TNH Scholar Aspires to Make Possible¶
TNH Scholar aspires to support the community in:
- Exploring teachings with bilingual text and translation side-by-side
- Searching themes and teachings across languages and periods using semantic search and retrieval
- Discovering related teachings, concepts, and practices through advanced search capabilities
- Reviewing and refining translations collaboratively with transparent history
- Connecting practitioners, researchers, and teachers with reliable digital resources
- Preserving teaching materials for future generations with clarity and care
These are aspirational but active development goals aligned with the needs of the Plum Village community.
Current Features¶
- Audio and transcript processing:
audio-transcribewith diarization and YouTube support - Text formatting and translation:
tnh-genCLI for prompt-driven text processing with human-friendly defaults and API mode for programmatic use — see the Thầy Edited Journal Text Case Study for a fully worked OCR-to-translation example - Headless workflow orchestration:
tnh-conductorfor maintained local/bootstrap workflow execution with status watching and canonical run artifacts - Acquisition utilities:
ytt-fetchfor transcripts;token-countandnfmtfor prep and planning - Setup and configuration:
tnh-setupplus guided config in Getting Started - Prompt system: See Prompt System Architecture and ADR-PT03 for current status and roadmap
✅ tnh-gen v1.0 Available: The
tnh-genCLI is now fully implemented with dual output modes (human-friendly by default,--apiflag for machine-readable output). See tnh-gen CLI Reference for complete documentation.
Getting Started¶
Choose your path based on your primary interest:
Path 1: Use the Tools¶
For practitioners, translators, and researchers ready to work with TNH Scholar:
Get up and running with TNH Scholar's CLI tools for transcription, translation, and text processing:
- Install from PyPI:
- Configure credentials per Configuration
- Follow the Quick Start Guide for your first workflow
- Explore task-oriented workflows in the User Guide
Path 2: Understand the Vision & Principles¶
For community members, stakeholders, and and those exploring how this project fits within Plum Village initiatives:
Explore the project's foundation, values, and long-term direction:
- Vision & Scope: Project Vision – multi-year aspirations, community alignment, and what's in/out of scope
- Philosophy: Philosophy – ethical foundations and mindful technology principles
- Principles: Design Principles – transparency, human judgment, and architectural values
- Community Context: Parallax Overview – relationship to broader Plum Village digital initiatives
Path 3: Contribute to Development¶
For developers, architects, and contributors:
Understand the technical foundation and start contributing:
- Setup: DEV_SETUP.md – development environment and workflows
- Architecture: System Design and Architecture Overview – core patterns and technical decisions
- Standards: Style Guide and Contributing – code quality and PR workflow
- Key ADRs: Start with GenAI Service Strategy and Prompt System Status
- Research: Research Index – experiments, evaluations, and exploratory work
- Future Directions: Long-term Vision – planned research directions and architectural horizons
- Common commands: , , , ,
Documentation Overview¶
- Getting Started: Installation, configuration, first-run guidance
- User Guide: Task-oriented workflows and practical how-tos
- CLI Reference: Auto-generated command documentation for every CLI entry point
- API: Python API reference (mkdocstrings)
- Architecture: ADRs, design docs, system diagrams by component
- Development: Contributor guides, design principles, engineering practices
- Docs Ops: Style guides, ADR template, documentation maintenance
- Research: Experiments, evaluations, exploratory notes
Project Status¶
TNH Scholar is currently in alpha stage. Expect ongoing API and workflow changes during active development.
Support & Community¶
- Bug reports & feature requests: GitHub Issues
- Questions & discussions: GitHub Discussions
License¶
This project is licensed under the GPL-3.0 License.
Documentation Map¶
Auto-generated map of the documentation hierarchy. Regenerated during docs builds; edit source content instead of this file.
Getting Started¶
User Guide¶
- A Buddhist Cosmological View — Vũ-trụ-quan Phật học
- A Buddhist Cosmological View — Vũ-trụ-quan Phật học (Refined TNH-voice Draft)
- A Buddhist Cosmological View — Vũ-trụ-quan Phật học (TNH-voice)
- Best Practices
- Pipeline Case Study: Thầy Edited Journal Text
- Post-pipeline Facsimile Step
- TNH Scholar Prompt System
- User Guide Overview
- Vũ-trụ-quan Phật học — Văn bản đã làm sạch (tiếng Việt)
Project¶
- Conceptual Architecture of TNH-Scholar
- Future Directions of TNH-Scholar
- TNH Scholar CHANGELOG
- TNH Scholar CONTRIBUTING
- TNH Scholar README
- TNH Scholar Release Checklist
- TNH Scholar TODO List
- TNH Scholar Versioning Policy
- TNH-Scholar DEV_SETUP
- TNH-Scholar Project Philosophy
- TNH-Scholar Project Principles
- TNH-Scholar Project Vision
Community¶
CLI Reference¶
- audio-transcribe
- Command Line Tools Overview
- json-to-srt
- nfmt
- sent-split
- srt-translate
- tnh-conductor
- tnh-gen
- tnh-setup
- token-count
- ytt-fetch
Architecture¶
- ADR-A01: Adopt Object-Service for GenAI Interactions
- ADR-A02: PatternCatalog Integration (V1)
- ADR-A08: Configuration / Parameters / Policy Taxonomy
- ADR-A09: V1 Simplified Implementation Pathway
- ADR-A11: Model Parameters and Strong Typing Fix
- ADR-A12: Prompt System & Fingerprinting Architecture (V1)
- ADR-A13: Migrate All OpenAI Interactions to GenAIService
- ADR-A14.1: Registry Staleness Detection and User Warnings
- ADR-A14: File-Based Registry System for Provider Metadata
- ADR-A15: Thread Safety and Rate Limiting
- ADR-AT01: AI Text Processing Pipeline Redesign
- ADR-AT02: TextObject Architecture Decision Records
- ADR-AT03.1: AT03→AT04 Transition Plan
- ADR-AT03.2: NumberedText Section Boundary Validation
- ADR-AT03.3: TextObject Robustness and Metadata Management
- ADR-AT03: Minimal AI Text Processing Refactor for tnh-gen
- ADR-AT04: AI Text Processing Platform Strategy
- ADR-CF01: Runtime Context & Configuration Strategy
- ADR-CF02: Prompt Catalog Discovery Strategy
- ADR-DD01: Documentation System Reorganization Strategy
- ADR-DD02: Documentation Main Content and Navigation Strategy
- ADR-DD03: Pattern to Prompt Terminology Standardization
- ADR-DD03: Phase 1 Execution Punch List
- ADR-JV03: Canonical XML AST for English Parsing
- ADR-JVB01: JVB Parallel Viewer v1 As-Built
- ADR-K01: Preliminary Architectural Strategy for TNH Scholar Knowledge Base
- ADR-MD01: Adoption of JSON-LD for Metadata Management
- ADR-MD02: Metadata Infrastructure Object-Service Integration
- ADR-OA01.1: TNH-Conductor — Provenance-Driven AI Workflow Coordination (v2)
- ADR-OA01.2: Conceptual Spike for Orientation-Based Supervisory Orchestration
- ADR-OA01.3: Practical Approach for the Orientation-Based Conceptual Spike
- ADR-OA01.4: Headless Agent Communication Functional Spike
- ADR-OA01: TNH-Conductor — Provenance-Driven AI Workflow Coordination
- ADR-OA02: Phase 0 Protocol Layer Spike — Headless Capture + Safety Controls
- ADR-OA03.1: Claude Code Runner
- ADR-OA03.2: Codex Runner
- ADR-OA03.3: Codex CLI Runner
- ADR-OA03: Agent Runner Architecture
- ADR-OA04.1: MVP Runtime Build-Out Sequence
- ADR-OA04.2: Runner Contract
- ADR-OA04.3.1: Run Transparency and State Reporting
- ADR-OA04.3: Provenance and Run-Artifact Contract
- ADR-OA04.4: Policy Enforcement Contract
- ADR-OA04.5: Harness Backend Contract
- ADR-OA04: Workflow Execution Contracts
- ADR-OA05: Prompt Library Specification
- ADR-OA06.1: Evaluator-Directed Revision Loop
- ADR-OA06: Planner Evaluator Contract
- ADR-OA07.1: Worktree Lifecycle and Rollback
- ADR-OA07: Diff-Policy + Safety Rails
- ADR-OS01: Object-Service Design Architecture V3
- ADR-PP01: Rapid Prototype Versioning Policy
- ADR-PT03: Prompt System Current Status & Roadmap
- ADR-PT04: Prompt System Refactor Plan (Revised)
- ADR-PT05.1: Prototype Prompt Workspace Simplification
- ADR-PT05: Prompt Platform Strategy
- ADR-PV01: Provenance & Tracing Infrastructure Strategy
- ADR-ST01.1: tnh-setup UI Design
- ADR-ST01: tnh-setup Runtime Hardening
- ADR-TG01.1: Human-Friendly CLI Defaults with --api Flag
- ADR-TG01: tnh-gen CLI Architecture
- ADR-TG02: TNH-Gen CLI Prompt System Integration
- ADR-TG03: Typed Completion Outcome and Adapter Diagnostics
- ADR-TG04.1: JSON Contract Runtime Validation
- ADR-TG04.2: Structured JSON Provenance Sidecars
- ADR-TG04.3: Structured Output Trust Boundaries and Control Surfaces
- ADR-TG04: Structured JSON Contract and Scope Boundaries
- ADR-TR01: AssemblyAI Integration for Transcription Service
- ADR-TR02: Optimized SRT Generation Design
- ADR-TR03: Standardizing Timestamps to Milliseconds
- ADR-TR04: AssemblyAI Service Implementation Improvements
- ADR-TR05.1: Speaker-Block Language Lock Default Strategy
- ADR-TR05.2: MVP Service Scaffold for Multilingual Transcription
- ADR-TR05: Language-Aware Multilingual Transcription Engine
- ADR-VP01: Video Processing Return Types and Configuration
- ADR-VP02: yt-dlp Operational Strategy
- ADR-VSC01: VS Code Integration Strategy (TNH-Scholar Extension v0.1.0)
- ADR-VSC02: VS Code Extension Architecture
- ADR-VSC03.2: Real-World Survey Addendum (VS Code as a UI/UX Platform)
- ADR-VSC03.3: Investigation Synthesis - Validation of Design Choices
- ADR-VSC03: Preliminary Investigation Findings
- ADR-VSC03: Python-JavaScript Impedance Mismatch Investigation
- ADR-YF00: Early yt-fetch Transcript Decisions (Historical)
- ADR-YF01: YouTube Transcript Source Handling
- ADR-YF02: YouTube Transcript Format Selection
- Agent Orchestration Collaboration Paradigms
- Agent Orchestration Recursive Bootstrap Proposal
- Agent Orchestration Spike Testing Sequence
- Architecture Blueprint
- Architecture Overview
- Audio Chunking Algorithm Design Document
- Bootstrap Proof Run Result
- Codex Harness End-to-End Test Report
- Codex Harness Spike Findings
- Codex Headless Communication Experiment Plan
- Codex Headless Communication Report
- Codex Headless Communication Research Directions
- Codex Headless Research Memo for Engineer Agents
- Codex Official Docs Reference Summary
- Current Bootstrap Proof Task Brief
- Current Supervisory Task Brief
- Design Memo: Re-centering Agent Orchestration on Bootstrap and Long-Horizon Useful Work
- Design Strategy: VS Code as UI/UX Platform for TNH Scholar
- Diarization Algorithms
- Diarization Chunker Module Design Strategy
- Diarization System Design
- Documentation Design
- Example Agent Workflow
- GenAI Service — Design Strategy
- Generate Markdown Translation JSON Pairs
- Generate Markdown Vietnamese
- Interval-to-Segment Mapping Algorithm
- JVB Viewer — Version 2 Strategy & High‑Level Design
- Language-Aware Chunking Orchestrator Notes
- LUÂN-HỒI
- minimal but extensible setup tool for the prototyping phase
- Modular Pipeline Design: Best Practices for Audio Transcription and Diarization
- OA01.x Cloud Direction Generation Comparison Result
- OA01.x Experimental Directions and Experiment Catalog
- OA01.x Experimental Directions Atlas
- OA01.x Spike Experiment Register
- OA07 Bootstrap Proof Workflow Plan
- OA07.1 PR-7 Worktree Runtime Boundary Plan
- OA07.1 PR-8 Bootstrap Headless Entry Plan
- OA1.x Experimental Directions Matrix
- Object-Service Design Gaps
- Object-Service Design Overview
- Object-Service Implementation Status
- OpenAI Interface Migration Plan
- Package Version Checker Design Document
- Practical Language-Aware Chunking Design
- Prompt Dir Task Brief
- Prompt Platform Cleanup Follow-On
- Prompt System Architecture
- Run Bootstrap Proof
- Run Prompt Dir Comparison
- Run SPIKE-10 Agent Coordination Comparison
- Run Supervisory Shell Trial
- Simplified Language-Aware Chunking Design
- Speaker Diarization Algorithm Design
- Speaker Diarization and Time-Mapped Transcription System Design
- SPIKE-02 Execution Context Comparison
- SPIKE-03 Native Subagent Smoke Test
- SPIKE-04 Narrow Supervisory Comparison
- SPIKE-05 Minimum Review Artifact Set
- SPIKE-06 Native Codex CLI Baseline
- SPIKE-07 Codex Home State Dependency
- SPIKE-08 Launch Context Environment Contamination
- SPIKE-09 Prompt Dir Three-Arm Comparison
- SPIKE-10 Agent Coordination Comparison Plan
- SPIKE-10 Agent Coordination Comparison Result
- SPIKE-10 Conductor Watch Task Brief
- Supervisory Team Workflow Contract
- Supervisory Team Workflow Contract V2
- TextObject Original Design
- TextObject System Design Document
- TimelineMapper Design Document
- TNH Configuration Management
- tnh-gen Docs Consistency + OCR Pipeline Walkthrough Plan
- tnh-gen Golden Artifact Preservation Note — May 2026
- tnh-gen GPT-5 Structured Output Eval — May 2026
- tnh-gen Implementation Plan — April 2026
- tnh-gen Robustness Review — April 2026
- tnh-gen UX Directions and Issues — May 2026
- TNH-Scholar Agent Orchestration System
- TNH‑Scholar Utilities Catalog
- Versioning Policy Documentation Additions
- VS Code 1.110 Agent Features Research
- YouTube API vs yt-dlp Evaluation
Development¶
- CI Workflow Cleanup Proposal
- Codex Repo Ops
- Contributing to TNH Scholar (Prototype Phase)
- Development Documentation
- Fine Tuning Strategy
- Forensic Analysis: December 7, 2025 Git Data Loss Incident
- Git Workflow & Safety Guide
- Human-AI Software Engineering Principles
- Implementation Summary: Git Safety Improvements
- Improvements / Initial structure
- Incident Report: Git Recovery - December 7, 2025
- Proposed Updates to Incident Report
- Release Workflow
- TNH Scholar Design Principles
- TNH Scholar Style Guide
- TNH Scholar System Design
- tnh-conductor Operator Guide
- v0.2.0 Tag Correction Plan
- yt-dlp Ops Check
Docs Ops¶
- ADR Template
- Markdown Standards
- MkDocs Strict Warning Backlog
- Preview TNH Scholar Theme
- TNH Scholar Theme Design
Research¶
- 1-3 Word Queries
- GPT Development Convos
- Passage Test
- Preliminary Feasibility Study
- RAG Research Directions for TNH Scholar
- Structural-Informed Adaptive Processing (SIAP) Methodology
- Summary Report on Metadata Extraction, Source Parsing, and Model Training for TNH-Scholar
- TNH Scholar Knowledge Base: Design Document