Architecture Overview¶

This document provides a high-level view of the TNH Scholar architecture. It is intentionally brief and conceptual, with links to more detailed design documents and ADRs.

TNH Scholar is built around a layered, object-service oriented architecture. At the highest level, it consists of:

A set of CLI tools for end users,
A GenAI Service for orchestrating AI model calls and prompt handling,
A family of processing pipelines (audio, text, metadata),
A knowledge and metadata layer for long-term corpus management.

Subsystem directory map¶

AI Text Processing — Text object system and AI text processing
Configuration — Configuration management system
Docs System — Documentation system architecture
GenAI Service — GenAI service architecture and interfaces
JVB Viewer — JVB viewer
Knowledge Base — Knowledge base architecture
Metadata — Metadata systems and JSON-LD
Object Service — Core object service architecture
Prompt System — Prompt system and pattern catalog
Setup TNH — Setup tool architecture
Transcription — Audio transcription and diarization
UI/UX — User interface and experience design
Utilities — Utility functions and tools
Video Processing — Video processing pipeline
ytt-fetch — YouTube transcript fetching

Architectural diagrams¶

flowchart TD
    U[User] --> CLI[CLI Tools]
    CLI -->|Commands & Config| SVC[GenAI Service]
    CLI --> PIPE[Processing Pipelines]
    SVC --> MODELS[AI Providers & Models]
    PIPE --> DATA[Corpus & Metadata Store]
    SVC --> DATA
    DATA --> APPS[Viewers & Downstream Apps]

  +--------+           +-----------------+
  |  User  |  <----->  |    CLI Tools    |
  +--------+           +-----------------+
                             |
                             v
                    +-----------------+
                    |  GenAI Service  |
                    +-----------------+
                      /           \
                     v             v
           +-----------------+   +-----------------+
           |  AI Providers   |   | Processing      |
           |  & Models       |   | Pipelines       |
           +-----------------+   +-----------------+
                     \           /
                      v         v
                    +-----------------+
                    | Corpus &        |
                    | Metadata Store  |
                    +-----------------+
                             |
                             v
                    +-----------------+
                    | Viewers &       |
                    | Downstream Apps |
                    +-----------------+

Key Components¶

CLI Tools¶

The CLI layer provides user-facing commands such as:

audio-transcribe
nfmt
tnh-setup
token-count
ytt-fetch

Note: tnh-gen is the unified CLI. See TNH-Gen Architecture for details.

These tools are small, composable, and focused on a single responsibility. They generally:

Accept file or directory inputs,
Read configuration from a shared workspace or config file,
Produce deterministic, reviewable outputs (text, JSON, or both).

More details:

CLI Overview
CLI guides: Command Line Tools Overview

GenAI Service¶

The GenAI Service is an internal orchestration layer that:

Manages prompt patterns and prompt catalogs,
Routes model requests to providers (for example, OpenAI),
Enforces configuration policies (models, parameters, safety),
Tracks fingerprints and provenance of AI-generated outputs.

It is implemented as an object-service, following the architecture described in:

Processing Pipelines¶

Processing pipelines connect the CLI layer, GenAI Service, and data layer. Major pipelines include:

Audio and Speech Pipelines
Chunking, diarization, and transcription for Dharma talks and related recordings.
Key documents live under architecture/transcription/.
Text Processing and Metadata Pipelines
Normalization, tagging, and metadata enrichment for books, journals, and other texts.
See AI text processing and prompt system designs under architecture/ai-text-processing/ and architecture/prompt-system/.

These pipelines are designed to be modular and testable, with clear seams for:

Replacing providers or tools,
Adding new steps (for example, a new tagging phase),
Running partial flows in isolation during development.

Corpus and Metadata Layer¶

The corpus and metadata layer is responsible for:

Storing canonical text (and possibly aligned translations),
Maintaining structured metadata (chapters, sections, paragraphs, exercises, footnotes),
Capturing provenance and versioning information for each artifact.

The exact backend may vary (for example, file-based JSON, SQL, or document stores), but the schema and contracts are intended to remain stable at the domain level.

Architectural Principles¶

Some guiding principles that shape the architecture:

Walking Skeleton First
Start with minimal end-to-end flows, then deepen each layer incrementally.
Object-Service and Ports/Adapters
Keep domain logic separate from I/O concerns and provider specifics.
Provenance and Traceability
Every AI-assisted transformation should be traceable back to:
The source materials,
The prompts or patterns used,
The models and parameters applied.
Human-Centered Design
The system is meant to assist practitioners and researchers, not replace them. Human review is a core part of the design.

For a more detailed discussion of design principles and patterns, see:

Historical References¶

📚 View superseded design documents (maintainers/contributors)

**Note**: These documents are archived and excluded from the published documentation. They provide historical context for early CLI design. - **[TNH-FAB CLI Specification](/architecture/tnh-gen/design/archive/tnh-fab-cli-spec.md)** (2024) *Status*: Superseded by ADR-TG01/ADR-TG02 (tnh-gen) - **[TNH-FAB Design Document](/architecture/tnh-gen/design/archive/tnh-fab-design-document.md)** (2024) *Status*: Superseded by ADR-TG01/ADR-TG02 (tnh-gen)