tnh-gen¶
TNH Scholar's unified command-line interface for GenAI-powered text processing operations. The tnh-gen CLI provides prompt discovery, text processing, configuration management, and VS Code integration support.
Overview¶
tnh-gen is a modern, object-service compliant CLI tool that replaces the legacy tnh-fab tool. It provides:
- Prompt Discovery: Browse and search available prompts with rich metadata
- Text Processing: Execute AI-powered transformations (translation, sectioning, summarization, etc.)
- Configuration Management: Hierarchical configuration with clear precedence rules
- Human-Friendly Defaults: Optimized for direct CLI usage with
--apiflag for programmatic consumption - Provenance Tracking: All outputs include generation metadata and fingerprints
Installation¶
Verify installation:
Quick Start¶
# List available prompts
tnh-gen list
# Execute a prompt (human-friendly output)
tnh-gen run --prompt translate \
--input-file teaching.md \
--var source_language=vi \
--var target_language=en
# Get machine-readable output for scripts
tnh-gen --api run --prompt translate \
--input-file teaching.md \
--var source_language=vi \
--var target_language=en
Global Flags¶
These flags work with all commands:
--api # Enable machine-readable API contract output (JSON)
--format FORMAT # Output format: json, yaml, text (the `list` command also supports table)
--prompt-dir PATH # Override the prompt catalog directory
--config PATH # Override config file location
--quiet, -q # Suppress non-error output
--no-color # Disable colored terminal output
Important: Global flags must come BEFORE the subcommand:
# Correct
tnh-gen --api list
tnh-gen --prompt-dir ./my-prompts run --prompt translate ...
# Incorrect (will not work)
tnh-gen list --api
Output Modes¶
tnh-gen has two output modes designed for different use cases:
Human Mode (Default): - Optimized for direct CLI usage - Simplified, readable output - Command-specific formatting - Omits verbose metadata
API Mode (--api flag):
- Optimized for programmatic consumption (VS Code, scripts)
- Complete structured JSON output
- Full metadata, provenance, and diagnostics
- Stable machine-readable contract
Flag Precedence:
1. --api controls WHAT data is included (full contract vs. human-friendly)
2. --format controls HOW it's serialized (json, yaml, text, table)
3. --api implies JSON by default; can be combined with --format yaml
4. --api cannot be combined with --format text or --format table
Examples:
# Human-friendly (default)
tnh-gen list
# → Simplified text format with descriptions
# API mode (JSON contract)
tnh-gen --api list
# → Full metadata as JSON
# API mode with YAML serialization
tnh-gen --api --format yaml list
# → Full metadata as YAML
# Human-friendly table
tnh-gen list --format table
# → Simplified table format
Commands¶
tnh-gen list¶
List all available prompts with metadata for discovery and selection.
Synopsis¶
Options¶
--tag TAG # Filter by tag (repeatable)
--search QUERY # Search in names/descriptions (case-insensitive)
--keys-only # Output only prompt keys (one per line)
Note: --api is a global flag and must come before list. For table output, use the list subcommand's own --format table option.
Human-Friendly Output (Default)¶
$ tnh-gen list
Available Prompts (3)
daily - Daily Guidance
Daily guidance prompt for testing.
Variables: audience, [location]
Model: gpt-4o | Tags: guidance, study
translate - Vietnamese-English Translation
Translate Vietnamese dharma texts to English with context awareness.
Variables: source_language, target_language, input_text, [context]
Model: gpt-4o | Tags: translation, dharma
summarize - Summarize Teaching
Generate concise summary of dharma teaching.
Variables: input_text, [max_length]
Model: gpt-4o-mini | Tags: summarization, dharma
Format Notes:
- Optional variables shown in brackets: [var_name]
- Blank line between prompts for easy scanning
- Metadata on single line: Model and Tags
API Output¶
$ tnh-gen --api list
{
"prompts": [
{
"key": "translate",
"name": "Vietnamese-English Translation",
"description": "Translate Vietnamese dharma texts to English",
"tags": ["translation", "dharma"],
"required_variables": ["source_language", "target_language", "input_text"],
"optional_variables": ["context"],
"default_variables": {},
"default_model": "gpt-4o",
"output_mode": "text",
"version": "1.0.0",
"warnings": []
}
],
"count": 1,
"sources": ["defaults+env", "workspace"]
}
Table Format¶
$ tnh-gen list --format table
KEY NAME TAGS MODEL
translate Vietnamese-English Translation translation, dharma gpt-4o
summarize Summarize Teaching summarization gpt-4o-mini
Filtering Examples¶
# Filter by tag
tnh-gen list --tag translation
# Search by keyword
tnh-gen list --search summarize
# Combine filters
tnh-gen list --tag dharma --search translation
# Get just the keys for scripting
tnh-gen list --keys-only
translate
summarize
daily
tnh-gen run¶
Execute a prompt with variable substitution and AI processing.
Synopsis¶
Required Options¶
--prompt KEY # Prompt key to execute (from tnh-gen list)
--input-file PATH # Input file (content auto-injected as input_text)
Variable Passing Options¶
Style 1: JSON file (preferred for complex variables)
Style 2: Inline parameters (convenient for simple cases)
Model and Parameter Overrides¶
--model MODEL_NAME # Override prompt's default model
--intent INTENT # Routing hint (translation, summarization, etc.)
--max-tokens INT # Max output tokens
--temperature FLOAT # Model temperature (0.0-2.0)
--top-p FLOAT # Nucleus sampling parameter
Output Options¶
Note: --format and --api are global flags that must come before run.
Variable Precedence¶
Variables are merged in this precedence order (highest to lowest):
- Inline
--varparameters (highest precedence) - JSON file via
--vars - Input file content (auto-injected as
input_text) (lowest precedence)
Example:
tnh-gen run --prompt translate \
--input-file teaching.md \
--vars base_vars.json \
--var source_language=vi \
--var target_language=en
If base_vars.json contains {"source_language": "en"}, the inline --var source_language=vi wins.
Human-Friendly Output (Default)¶
$ tnh-gen run --prompt translate --input-file teaching.md --var source_language=vi --var target_language=en
[Generated translation text appears here without JSON wrapper...]
Behavior: - Only the generated text is printed to stdout - No JSON structure wrapper - Suitable for piping to other commands or files
API Output¶
$ tnh-gen --api run --prompt translate --input-file teaching.md --var source_language=vi --var target_language=en
{
"status": "succeeded",
"result": {
"text": "[Generated translation...]",
"model": "gpt-4o",
"usage": {
"prompt_tokens": 1234,
"completion_tokens": 567,
"total_tokens": 1801,
"estimated_cost_usd": 0.08
},
"latency_ms": 3456
},
"provenance": {
"backend": "openai",
"model": "gpt-4o",
"prompt_key": "translate",
"prompt_fingerprint": "sha256:abc123...",
"prompt_version": "1.0.0",
"started_at": "2025-12-28T10:30:00Z",
"completed_at": "2025-12-28T10:30:03Z",
"schema_version": "1.0"
},
"trace_id": "01HQXYZ123ABC"
}
File Output Handling¶
When --output-file is specified:
- Write result text to file
- Prepend provenance markers (unless
--no-provenance) - Use appropriate format (markdown, JSON, etc.)
- Print success message to stderr
- Print JSON response to stdout (for client parsing)
Provenance Marker Format (YAML frontmatter):
---
tnh_scholar_generated: true
prompt_key: translate
prompt_version: "1.0.0"
model: gpt-4o
fingerprint: sha256:abc123...
trace_id: 01HQXYZ123ABC
generated_at: "2025-12-28T10:30:03Z"
schema_version: "1.0"
---
[Generated content follows...]
Examples¶
# Simple translation with inline variables
tnh-gen run --prompt translate \
--input-file teaching.md \
--var source_language=vi \
--var target_language=en \
--output-file teaching.en.md
# Complex variables via JSON file
tnh-gen run --prompt summarize \
--input-file lecture.md \
--vars config.json \
--output-file lecture.summary.md
# Override model
tnh-gen run --prompt translate \
--input-file teaching.md \
--vars vars.json \
--model gpt-4o \
--output-file teaching.translate.md
# JSON output for scripting
tnh-gen --api run --prompt extract_quotes \
--input-file teaching.md > output.json
tnh-gen config¶
Manage configuration settings with hierarchical precedence.
Synopsis¶
Subcommands¶
tnh-gen config show # Display current configuration
tnh-gen config get KEY # Get specific config value
tnh-gen config set KEY VALUE # Set config value
tnh-gen config list # List all config keys
Configuration Sources and Precedence¶
Configuration loaded in this order (highest to lowest precedence):
- CLI flags (e.g.,
--model gpt-4o) - Workspace config (
.vscode/tnh-scholar.jsonor local project config) - User config (
~/.config/tnh-scholar/config.json) - Environment variables (
TNH_GENAI_MODEL,OPENAI_API_KEY,TNH_PROMPT_DIR) - Defaults (defined in GenAI Service and prompt system)
Configuration Schema¶
{
"prompt_catalog_dir": "/path/to/prompts",
"default_model": "gpt-4o-mini",
"max_dollars": 0.30,
"max_input_chars": 50000,
"default_temperature": 0.2,
"provider_api_keys": {
"openai": "$OPENAI_API_KEY",
"anthropic": "$ANTHROPIC_API_KEY"
}
}
Note: API keys can reference environment variables using $VAR_NAME syntax.
Human-Friendly Output (Default)¶
Behavior: - YAML format for editability - Shows only user/workspace overrides - Omits defaults and source annotations
API Output¶
$ tnh-gen --api config show
{
"config": {
"prompt_catalog_dir": "/custom/path",
"default_model": "gpt-4o-mini",
"max_dollars": 0.30,
"provider_api_keys": {
"openai": "${OPENAI_API_KEY}",
"anthropic": "${ANTHROPIC_API_KEY}"
}
},
"sources": ["defaults+env", "user", "workspace"],
"config_files": [
"/path/to/workspace/.vscode/tnh-scholar.json",
"~/.config/tnh-scholar/tnh-gen.json"
]
}
Examples¶
# Show all configuration (human-friendly YAML)
tnh-gen config show
# Show all configuration with sources (API mode)
tnh-gen --api config show
# Get specific value
tnh-gen config get default_model
gpt-4o-mini
# Set value (writes to user config)
tnh-gen config set max_dollars 0.30
# Set workspace-level config
tnh-gen config set --workspace prompt_catalog_dir ./tnh-prompts
tnh-gen version¶
Display version information for debugging and compatibility verification.
Synopsis¶
Human-Friendly Output (Default)¶
API Output¶
$ tnh-gen --api version
{
"tnh_scholar": "0.2.2",
"tnh_gen": "0.2.2",
"python": "3.12.4",
"platform": "darwin",
"prompt_system_version": "1.0.0",
"genai_service_version": "1.0.0"
}
Pipeline Examples¶
These examples show a simplified OCR journal flow. The current canonical worked example is
the rebuilt five-page journal pipeline in
tests/golden/journal-pipeline/5page/ with the full reviewed artifact chain under
tests/golden/journal-pipeline/walkthrough/clean_translate_5page/.
See the Thầy Edited Journal Text Case Study for the
fully annotated current workflow.
Recommended Pipeline (section → clean → translate)¶
section_by_break performs the document split using structural breaks in the numbered OCR text — page headers, blank lines, article titles. Each section (roughly one journal page) is then cleaned and translated in a focused, contained call.
# Step 1: Number raw lines (preprocessing)
tnh-lines number source.txt source_numbered.txt
# Step 2: Section — identifies page breaks, produces metadata JSON
tnh-gen run --prompt section_by_break \
--input-file source_numbered.txt \
--var source_language=Vietnamese \
--var target_section_count=4 \
--var document_metadata='title: ...\nauthor: ...\njournal: ...' \
--output-file sections.json
# Step 3: Extract section line ranges (see Pipeline Walkthrough for helper script)
# → produces section_1_raw.txt ... section_4_raw.txt
# Step 4: Clean each section (focused per-page context)
for i in 1 2 3 4; do
tnh-gen run --prompt default_clean_numbered \
--input-file section_${i}_raw.txt \
--var source_language=Vietnamese \
--var publication_name="Phật Giáo Việt Nam" \
--var publisher_mark="Tư Viện Huệ Quang" \
--output-file section_${i}_clean.txt
done
# Step 5: Translate each section with full document context
for i in 1 2 3 4; do
tnh-gen run --prompt default_line_translate \
--input-file section_${i}_clean.txt \
--vars sections.json \
--var source_language=Vietnamese \
--var target_language=English \
--var style=scholarly \
--output-file section_${i}_translated.txt
done
# Step 6: Combine
cat section_{1,2,3,4}_translated.txt > final_translated.txt
What raw OCR looks like before cleaning:
1.―
1- ← duplicate section marker
Khuynh hướng Túc mệnh-luận...
...
PHẬT GIÁO VIỆT NAM ← running footer mid-paragraph
After default_clean_numbered:
Footer removed; duplicate marker resolved; line numbers contiguous.
Simpler Alternative (no line tracking)¶
For quick passes where section metadata and line references are not needed:
for page in 7 8 9 10; do
tnh-gen run --prompt default_clean \
--input-file source_page_${page}.txt \
--var source_language=Vietnamese \
--output-file page_${page}_clean.txt
tnh-gen run --prompt default_punctuate \
--input-file page_${page}_clean.txt \
--var source_language=Vietnamese \
--output-file page_${page}_punctuated.txt
done
Per-page source files (source_page_7.txt etc.) are pre-extracted and available in tests/golden/journal-pipeline/.
Error Handling¶
Exit Codes¶
| Exit Code | Error Type | Description |
|---|---|---|
0 |
Success | Operation completed successfully |
1 |
Policy Error | Budget exceeded, size limits, validation failed |
2 |
Transport Error | API failure, timeout, network issues |
3 |
Provider Error | Model unavailable, rate limit, auth failure |
4 |
Format Error | JSON parse failure, schema validation failed |
5 |
Input Error | Invalid arguments, missing required variables |
Error Output¶
Human-Friendly Error¶
$ tnh-gen run --prompt missing_prompt --input-file test.md
# stdout:
Error: Prompt 'missing_prompt' not found
Suggestion: Run 'tnh-gen list' to see available prompts, or check your prompt key spelling.
# stderr:
[2025-12-28 10:15:23] trace_id=01JGKZ... error_code=PROMPT_NOT_FOUND
API Error¶
$ tnh-gen --api run --prompt missing_prompt --input-file test.md
# stdout:
{
"status": "failed",
"error": "Prompt 'missing_prompt' not found",
"diagnostics": {
"error_type": "PromptNotFoundError",
"error_code": "PROMPT_NOT_FOUND",
"suggestion": "Run 'tnh-gen list' to see available prompts"
},
"trace_id": "01JGKZ..."
}
# stderr:
[2025-12-28 10:15:23] trace_id=01JGKZ... error_code=PROMPT_NOT_FOUND
Trace ID: Use the trace ID from stderr to correlate with logs or support requests. Set TNH_TRACE_ID environment variable to override auto-generation.
Environment Variables¶
TNH_PROMPT_DIR # Path to prompt catalog directory
OPENAI_API_KEY # OpenAI API key (required for OpenAI models)
ANTHROPIC_API_KEY # Anthropic API key (required for Claude models)
TNH_GENAI_MODEL # Default model to use
TNH_TRACE_ID # Override auto-generated trace ID
TNH_CONFIG_PATH # Override config file location
Architecture¶
tnh-gen follows object-service architecture patterns and integrates with:
- Prompt System - Discovery and rendering via
PromptsAdapter - GenAI Service - Model execution and provenance tracking
- AI Text Processing - Refactored text processing pipeline
- Configuration System - Hierarchical settings management
For architectural details, see: - ADR-TG01: CLI Architecture - ADR-TG01.1: Human-Friendly Defaults - ADR-TG02: Prompt System Integration - TNH-Gen Architecture Overview
Migration from tnh-fab¶
The tnh-gen CLI supersedes the legacy tnh-fab tool. Key differences:
| Aspect | tnh-fab (Legacy) | tnh-gen (Current) |
|---|---|---|
| Architecture | Monolithic, mixed concerns | Object-service compliant |
| Output Mode | JSON-first | Human-friendly by default |
| Prompt System | Legacy patterns | New prompt catalog |
| Configuration | Ad-hoc, TNH_PROMPT_DIR |
Hierarchical, TNH_PROMPT_DIR |
| VS Code Support | None | First-class with --api flag |
Migration Steps:
- Replace
tnh-fab run <pattern>withtnh-gen run --prompt <pattern> - Update environment variable to
TNH_PROMPT_DIR - Update scripts to use
--apiflag for JSON output - Review configuration files for new schema
For complete migration guide, see TNH-Gen Architecture.