Skip to content

tnh-fab

Overview

tnh-fab is a specialized command-line tool, part of the Thich Nhat Hanh Scholar Project. tnh-fab can process multilingual texts using AI 'patterns'. It is originally developed to work with processing Dharma-based materials such as talks by Thich Nhat Hanh.

It provides functionality for:

  • Adding/correcting punctuation
  • Identifying logical sections
  • Performing line-based translation
  • Applying custom text processing patterns

Installation

pip install tnh-scholar

Basic Usage

tnh-fab [COMMAND] [OPTIONS] [INPUT_FILE]

Input can be provided either as a file or through standard input (stdin).

Global options:

-v, --verbose        Enable detailed logging
--debug             Enable debug output
--quiet             Suppress all non-error output

Commands

1. Punctuate

Adds or corrects punctuation based on language-specific rules.

Basic usage:

tnh-fab punctuate [OPTIONS] [INPUT_FILE]

Options:

-l, --language      Source language code (auto-detected if not specified)
-y, --style         Punctuation style (default: 'APA')
-c, --review-count  Number of review passes (default: 3)
-p, --pattern       Pattern name (default: 'default_punctuate')

Examples:

# Basic punctuation of a file
tnh-fab punctuate input.txt

# Punctuate Vietnamese text with specific style
tnh-fab punctuate -l vi -y "Modern" vietnamese_text.txt

# Process from stdin with extra review passes
cat unpunctuated.txt | tnh-fab punctuate -c 5

# Use custom pattern with specific language
tnh-fab punctuate -p dharma_punctuation -l en text.txt

2. Section

Analyzes text and divides it into logical sections.

Basic usage:

tnh-fab section [OPTIONS] [INPUT_FILE]

Options:

-l, --language      Source language code (auto-detected if not specified)
-n, --num-sections  Target number of sections (auto-calculated if not specified)
-c, --review-count  Number of review passes (default: 3)
-p, --pattern       Pattern name (default: 'default_section')

Examples:

# Auto-detect sections
tnh-fab section dharma_talk.txt

# Create specific number of sections
tnh-fab section -n 5 long_text.txt

# Section Vietnamese text with custom pattern
tnh-fab section -l vi -p vn_section_pattern text.txt

# Process from stdin and save to file
cat text.txt | tnh-fab section > sections.json

3. Translate

Performs line-by-line translation while maintaining structure.

Basic usage:

tnh-fab translate [OPTIONS] [INPUT_FILE]

Options:

-l, --language       Source language code (auto-detected if not specified)
-r, --target         Target language code (default: 'en')
-y, --style          Translation style
--context-lines      Number of context lines (default: 3)
--segment-size       Lines per translation segment (auto-calculated if not specified)
-p, --pattern        Pattern name (default: 'default_line_translation')

Examples:

# Basic Vietnamese to English translation
tnh-fab translate -l vi vietnamese_text.txt

# Translation with specific style
tnh-fab translate -l vi -y "American Dharma Teaching" text.txt

# French translation with increased context
tnh-fab translate -l vi -r fr --context-lines 5 text.txt

# Custom segment size and pattern
tnh-fab translate --segment-size 20 -p custom_translation text.txt

4. Process

Applies custom pattern-based processing with flexible structuring.

Basic usage:

tnh-fab process -p PATTERN [OPTIONS] [INPUT_FILE]

Options:

-p, --pattern     Pattern name (required)
-s, --section     Process using sections from JSON file
-g, --paragraph   Process text by paragraphs
-t, --template    YAML file with template values

Examples:

# Basic processing with pattern
tnh-fab process -p format_xml input.txt

# Process using sections from file
tnh-fab process -p format_xml -s sections.json input.txt

# Process by paragraphs
tnh-fab process -p format_xml -g long_text.txt

# Process with template values
tnh-fab process -p format_xml -t template.yaml input.txt

Advanced Usage Examples

Pipeline Processing

1. Punctuate, section, and process

cat raw_text.txt | \
tnh-fab punctuate -l vi | \
tnh-fab section -n 5 | \
tnh-fab process -p format_xml > final_output.xml

2. Section and translate

tnh-fab section input.txt > sections.json
tnh-fab translate -l vi -s sections.json input.txt > translated.txt

3. Complete processing pipeline

cat vietnamese_text.txt | \
tnh-fab punctuate -l vi -y "Modern" | \
tnh-fab section --num-sections 10 | \
tnh-fab translate -r en --context-lines 5 | \
tnh-fab process -p format_xml > final_output.xml

Pattern System

Patterns are stored in the pattern directory (default or specified by TNH_PATTERN_DIR environment variable). Each command uses specific default patterns that can be overridden using the -p option.

For information about creating custom patterns, please refer to the pattern documentation.