Skip to content

TNH-FAB Command Line Tool Specification

Overview

tnh-fab is a command-line text processing tool providing standalone but pipeable operations for Buddhist text processing. It is part of the tnh-scholar suite of tools, focusing on simplicity, flexibility, and consistent behavior.

Core Functionality

  • Text punctuation and formatting
  • Section creation and management
  • Translation (line-by-line and block)
  • Pattern-based text processing
  • XML/structured output generation

Command Structure

tnh-fab <command> [options] [input_file]

Global Options

Input/Output:
  [input_file]                Input file (optional, uses STDIN if not provided)
  -o, --output FILE          Output file (default: STDOUT)
  -d, --output-dir DIR       Output directory (default: current)
  -f, --format FORMAT        Output format: txt/json/yaml/xml (default varies by command)

Configuration:
  -l, --language LANG        Source language code (auto-detect if not specified)
  -t, --template FILE        Template values file (YAML format)
  -k, --key-values PAIRS     Space-separated key:value pairs (e.g., speaker:"Name")
  -p, --pattern NAME         Pattern name (command-specific default if not specified)
  -c, --review-count NUM       Number of review passes (default: 3)

Logging:
  -v, --verbose             Enable detailed logging
  --debug                   Enable debug output
  --quiet                   Suppress all non-error output

Other:
  -h, --help               Show command-specific help
  --version                Show version information

Commands

punctuate

Add punctuation and structure to text.

Additional Options:
  -y, --style STYLE            Punctuation style (default: configuration file)

section

Create and manage text sections.

Additional Options:
  -n, --num-sections NUM   Target number of sections (default: auto)
  --target-section-size NUM   Target section size in tokens (default: configuration file)

translate

Perform line-by-line or block translation.

Additional Options:
  -r, --target LANG        Target language code (default: en)
  -y, --style STYLE            Translation style (default: configuration file)
  --context-lines NUM      Number of context lines (default: 3)
  --segment-size NUM       Lines per translation segment (default: auto)

process

Execute pattern-based text processing. Can work on sections of data or on the whole input stream.

Additional Options:
  -s, --sections FILE      JSON file containing section data
  -g, --paragraph         Use line-separated paragraphs as sections
  --xml                   Wrap output in XML style document tags

Input/Output Handling

Input Sources

  • File specified as argument
  • STDIN (piped input)
  • Section data (JSON format)
  • Template files (YAML format)

Input Processing

  1. Text content priority:
  2. Named input file
  3. STDIN if no file specified
  4. Error if neither available

  5. Section data priority:

  6. JSON file specified with -s
  7. STDIN when paired with input file
  8. Auto-generated sections if no pattern specified

  9. Template values priority:

  10. Command line key-values (-k)
  11. Template file values (-t)
  12. Default values from pattern
  13. Environment variables (TNH_FAB_*)

Output Handling

  1. Output destination priority:
  2. File specified by -o
  3. STDOUT if no file specified

  4. Format determination:

  5. Format specified by -f
  6. Default format by command:
    • punctuate: txt
    • section: json
    • translate: txt
    • process: txt

Pattern Management

Pattern Resolution

  1. Pattern name sources (in order):
  2. Command line (-p)
  3. Command defaults:

    • punctuate: default_punctuate
    • section: default_section
    • translate: default_translate
    • process: NO DEFAULT (must be specified)
  4. Pattern search paths:

  5. Path specified in configuration
  6. ~/.config/tnh_scholar/patterns/

Template Value Processing

  1. Key-Value Format: key:value key2:"value with spaces"
  2. Keys must be valid identifiers
  3. Values with spaces must be quoted
  4. Invalid formats raise error

  5. Template File Format (YAML): yaml key1: value1 key2: value2

  6. Environment Variables:

  7. Format: TNH_FAB_{KEY}
  8. Lowest priority in template resolution

Pipeline Behavior

Data Flow

  • All commands accept STDIN
  • All commands can output to STDOUT
  • Section data can flow through pipeline
  • Binary data not supported

Pipeline Examples

# Punctuate and section
cat input.txt | tnh-fab punctuate | tnh-fab section > sections.json

# Section and process
tnh-fab section input.txt | tnh-fab process -p format_xml > output.xml

# Complete pipeline
cat input.txt | \
  tnh-fab punctuate -l vi | \
  tnh-fab section -n 5 | \
  tnh-fab process -p format_xml -k speaker:"Thay" > output.xml
  1. Single File Input bash tnh-fab process -p format_xml input.txt
  2. Processes input.txt directly. No sectioning is performed.

  3. STDIN Only bash cat input.txt | tnh-fab process -p format_xml cat input.txt | tnh-fab process -g -p format_xml # process by paragraphs

  4. Processes text from STDIN

  5. File + Sections File bash tnh-fab process -p format_xml -s sections.json input.txt

  6. Processes input.txt using sections from sections.json

  7. STDIN + Sections File bash cat input.txt | tnh-fab process -p format_xml -s sections.json

  8. Processes STDIN text using sections from sections.json

  9. Section Stream + Input File bash tnh-fab section input.txt | tnh-fab process -p format_xml input.txt

  10. Processes input.txt using sections from STDIN

Input Validation

  • When sections are provided (via -s or STDIN):
  • Validates JSON format matches TextObject schema
  • Checks source_file field in TextObject if present
  • Warns if source_file doesn't match input file name
  • Validates section line ranges against input text

Configuration

Configuration Files

  1. User: ~/.config/tnh_scholar/tnh-fab/config.yaml
  2. Project: ./.tnh-fab.yaml
  3. Priority: Project > User

Configuration Format

defaults:
  language: auto
  output_format: txt

punctuate:
  pattern: default_punctuate
  style: APA
  review_count: 3

section:
  pattern: default_section
  review_count: 3

translate:
  pattern: default_translate
  target_language: en
  style: "American Dharma Teaching"
  context_lines: 3
  review_count: 3

process:
  wrap_document: true

patterns:
  path: ~/.config/tnh_scholar/patterns

logging:
  level: INFO
  file: ~/.tnh-fab.log

Error Handling

Error Categories

  1. Input Errors
  2. Missing required input
  3. Invalid file formats
  4. Encoding issues
  5. Section/input mismatch

  6. Pattern Errors

  7. Missing required pattern
  8. Pattern not found
  9. Invalid pattern format

  10. Template Errors

  11. Invalid template format
  12. Missing required values
  13. Invalid key-value syntax

  14. Processing Errors

  15. AI service errors
  16. Timeout errors
  17. Validation failures

Error Reporting

  • Standard error format
  • Error codes for scripting
  • Detailed logging with -v
  • Stack traces with --debug

Exit Codes

0  Success
1  General error
2  Input error
3  Pattern error
4  Template error
5  Processing error
64-73  Command-specific errors

Development Notes

Key Decision Points

  1. Pattern Management:
  2. Consider pattern versioning
  3. Pattern validation requirements
  4. Pattern update mechanism

  5. Pipeline Handling:

  6. Memory management for large files
  7. Progress indication in pipelines
  8. Error propagation in pipelines

  9. Configuration:

  10. Environment variable handling
  11. Configuration validation
  12. Configuration migration

  13. Testing Requirements:

  14. Unit test coverage requirements
  15. Integration test scenarios
  16. Performance benchmarks

Future Considerations

  1. Additional Commands:
  2. Format validation
  3. Pattern management
  4. Batch processing

  5. Extensions:

  6. Plugin system
  7. Custom pattern repositories
  8. API integration

  9. Integration:

  10. CI/CD requirements
  11. Packaging requirements
  12. Documentation generation