Skip to content

Best Practices

This guide outlines recommended practices for using TNH Scholar effectively.

General Guidelines

Text Processing

1. Input Preparation

  • Ensure text files use consistent line endings
  • Remove any special formatting or control characters
  • Use UTF-8 encoding for all text files
  • Consider running nfmt on input files to standardize formatting

2. Language Handling

  • Specify language codes explicitly when known
  • Use ISO 639-1 two-letter codes (e.g., 'en', 'vi')
  • Allow auto-detection only for simple cases

3. File Management

  • Keep original files backed up
  • Use descriptive file names
  • Maintain consistent directory structure
  • Store intermediate results when running long pipelines

Command-Line Tools

TNH-FAB

1. Pattern Selection

  • Use default patterns for initial testing
  • Create custom patterns for specific needs
  • Test patterns with small samples first
  • Document pattern modifications

2. Pipeline Design

  • Break complex processing into steps
  • Use intermediate files for long pipelines
  • Validate output at each stage
  • Consider using tee for debugging

Example of good pipeline practice:

# Good: Save intermediate results
cat input.txt | \
  tnh-fab punctuate > punctuated.txt && \
  tnh-fab section punctuated.txt > sections.json && \
  tnh-fab process -p format_xml -s sections.json punctuated.txt > final.xml

# Not recommended: Direct pipeline without saves
cat input.txt | tnh-fab punctuate | tnh-fab section | tnh-fab process -p format_xml

3. Error Handling

  • Check command exit codes
  • Save error output for debugging
  • Use verbose mode for troubleshooting
  • Keep log files organized

Audio-Transcribe

1. Audio Processing

  • Use appropriate chunk sizes for content
  • Consider silence detection for natural breaks
  • Monitor transcription quality
  • Save intermediate audio chunks

2. YouTube Downloads

  • Verify video availability before batch processing
  • Use CSV files for bulk operations
  • Include timestamps when needed
  • Save downloaded audio files

Pattern Development

1. Pattern Design

  • Keep patterns focused and single-purpose
  • Include clear documentation
  • Test with various input types
  • Patterns are automatically version controlled using git.
  • Use git tools to inspect and manage versions.

2. Template Variables

  • Use descriptive variable names
  • Provide default values when appropriate
  • Document required variables
  • Test variable combinations

Example pattern structure:

---
description: Example pattern for formatting
version: 1.0
author: TNH Scholar
---
Process this text according to these parameters:

Language: {{ language }}
Style: {{ style_convention }}
Review Count: {{ review_count }}

Additional instructions...

Performance Optimization

1. Resource Management

  • Monitor token usage
  • Use appropriate batch sizes
  • Consider chunking for large files
  • Cache intermediate results

2. API Usage

  • Implement rate limiting
  • Handle API errors gracefully
  • Monitor usage and costs
  • Use appropriate models for tasks

Testing and Validation

1. Input Validation

  • Test with various file sizes
  • Include different languages
  • Check character encodings
  • Verify line endings

2. Output Validation

  • Verify file formats
  • Check content integrity
  • Validate XML structure
  • Compare with expected results

3. Process Validation

  • Monitor system resources
  • Track processing time
  • Log error conditions
  • Document edge cases

Security Considerations

1. API Keys

  • Use environment variables
  • Never commit keys to version control
  • Rotate keys regularly
  • Monitor API usage

2. File Handling

  • Validate input files
  • Use secure file permissions
  • Clean up temporary files
  • Handle sensitive content appropriately

Documentation

1. Code Comments

  • Document complex logic
  • Explain pattern usage
  • Note assumptions
  • Include examples

2. Process Documentation

  • Document workflows
  • Create usage examples
  • Update for changes
  • Include troubleshooting guides