Skip to content

User Guide Overview

This User Guide describes how to use TNH Scholar as a tool user or workflow designer. It focuses on practical flows, concrete decisions, and how the pieces fit together, without requiring you to understand every internal design document.

If you are new to the project, you may want to read the TNH Scholar index first, then return here when you are ready to dive into concrete workflows.


Roles and Typical Usage

Most people who interact with TNH Scholar do so in one of these roles:

  • Tool user
    Runs the CLI commands to process specific audio, text, or video inputs, and reviews the outputs.

  • Workflow designer
    Chains together multiple tools (and sometimes GenAIService calls) into repeatable flows for a community or project.

  • Developer or maintainer
    Extends the codebase, adds new tools, or modifies existing ones.

This guide is aimed primarily at tool users and workflow designers. Developers should also see the development docs.


The Main Workflows

The current CLI and service layer support three broad types of workflows.

1. Audio and Video to Clean Text

Goal: Start from a recorded Dharma talk or teaching session and end with a clean, reviewable transcript.

Typical steps:

  1. Transcribe audio with audio-transcribe
  2. Input: audio or video file (for example, .wav, .mp3, .mp4).
  3. Output: timestamped transcript (often JSON and/or text).

  4. Normalize formatting with nfmt

  5. Input: transcript text.
  6. Output: normalized plain text, with consistent line wrapping, spacing, and punctuation.

  7. Optional: apply structure or tagging with tnh-fab (deprecated; migrating to tnh-gen) or GenAIService-based flows

  8. Add markers for paragraphs, headings, quotes, or exercises.
  9. Prepare the text for metadata or translation workflows.

Relevant documentation:


2. Existing Text to Structured, Metadata-Rich Text

Goal: Take texts that already exist (OCR, EPUB, PDF-derived, or plain text) and make them structured, tagged, and ready for search, translation, or archival use.

Typical steps:

  1. Normalize and clean
  2. Use nfmt or equivalent preprocessing to remove obvious noise and enforce consistent formatting.

  3. Apply patterns and prompts with tnh-fab (deprecated; migrating to tnh-gen)

  4. Use domain-specific patterns or prompts to:

    • Identify headings and sections,
    • Tag poems, plays, quotes, exercises, or notes,
    • Insert metadata or footnote markers.
  5. Review and refine

  6. Humans review the output, correct tagging, and adjust patterns as needed.
  7. The corrected text becomes a better training or reference dataset for future workflows.

Relevant documentation:


3. Prepared Text to Model-Ready Chunks

Goal: Convert cleaned and structured text into units suitable for:

  • Vector embedding and semantic search,
  • Translation via GenAIService or other models,
  • Evaluation and QA workflows.

Typical steps:

  1. Segment text into chunks
  2. Apply rules based on token length, semantic boundaries, or structural markers (for example, sections, paragraphs, stanzas).

  3. Estimate token usage with token-count

  4. Check that individual chunks fit model limits.
  5. Plan batch sizes and costs for large-scale processing.

  6. Run AI workflows via GenAIService or other orchestration tools

  7. For example, translation, query-text pair generation, or similarity search indexing.

Relevant documentation:


Choosing the Right Tool

When deciding which tool or workflow to use, consider:

  • Type of input
  • Audio or video → start with audio-transcribe.
  • Text or OCR output → start with nfmt and/or tnh-fab.

  • Target output

  • Human-readable transcript → focus on audio-transcribe + nfmt.
  • Machine-usable chunks for search or translation → include chunking logic and token-count.
  • Rich, tagged editions → lean on tnh-fab and relevant prompt patterns.

  • Review requirements

  • For archival or publication-ready materials, assume human review is mandatory.
  • For internal experimentation, you may tolerate more automation, but provenance still matters.

The CLI Overview includes a quick decision table for common scenarios.


Provenance and Human Oversight

A central principle of TNH Scholar is that all AI-assisted outputs must be traceable and reviewable. In practice this means:

  • Keeping original sources (audio, scans, raw text) accessible and referenced.
  • Recording which tools, prompts, and models were used to generate any derived artifact.
  • Encouraging review workflows where humans accept, modify, or reject AI-suggested changes.

The internal GenAIService and prompt system are designed to support these requirements. See:


Where to Go Next

Suggested next readings: