tnh-gen¶

TNH Scholar's unified command-line interface for GenAI-powered text processing operations. The tnh-gen CLI provides prompt discovery, text processing, configuration management, and VS Code integration support.

Overview¶

tnh-gen is a modern, object-service compliant CLI tool that replaces the legacy tnh-fab tool. It provides:

Prompt Discovery: Browse and search available prompts with rich metadata
Text Processing: Execute AI-powered transformations (translation, sectioning, summarization, etc.)
Configuration Management: Hierarchical configuration with clear precedence rules
Human-Friendly Defaults: Optimized for direct CLI usage with --api flag for programmatic consumption
Provenance Tracking: All outputs include generation metadata and fingerprints

Installation¶

pip install tnh-scholar

Verify installation:

tnh-gen version

Quick Start¶

# List available prompts
tnh-gen list

# Execute a prompt (human-friendly output)
tnh-gen run --prompt translate \
  --input-file teaching.md \
  --var source_language=vi \
  --var target_language=en

# Get machine-readable output for scripts
tnh-gen --api run --prompt translate \
  --input-file teaching.md \
  --var source_language=vi \
  --var target_language=en

Global Flags¶

These flags work with all commands:

--api              # Enable machine-readable API contract output (JSON)
--format FORMAT    # Output format: json, yaml, text (the `list` command also supports table)
--prompt-dir PATH  # Override the prompt catalog directory
--config PATH      # Override config file location
--quiet, -q        # Suppress non-error output
--no-color         # Disable colored terminal output

Important: Global flags must come BEFORE the subcommand:

# Correct
tnh-gen --api list
tnh-gen --prompt-dir ./my-prompts run --prompt translate ...

# Incorrect (will not work)
tnh-gen list --api

Output Modes¶

tnh-gen has two output modes designed for different use cases:

Human Mode (Default): - Optimized for direct CLI usage - Simplified, readable output - Command-specific formatting - Omits verbose metadata

API Mode (--api flag): - Optimized for programmatic consumption (VS Code, scripts) - Complete structured JSON output - Full metadata, provenance, and diagnostics - Stable machine-readable contract

Flag Precedence: 1. --api controls WHAT data is included (full contract vs. human-friendly) 2. --format controls HOW it's serialized (json, yaml, text, table) 3. --api implies JSON by default; can be combined with --format yaml 4. --api cannot be combined with --format text or --format table

Examples:

# Human-friendly (default)
tnh-gen list
# → Simplified text format with descriptions

# API mode (JSON contract)
tnh-gen --api list
# → Full metadata as JSON

# API mode with YAML serialization
tnh-gen --api --format yaml list
# → Full metadata as YAML

# Human-friendly table
tnh-gen list --format table
# → Simplified table format

Commands¶

`tnh-gen list`¶

List all available prompts with metadata for discovery and selection.

Synopsis¶

tnh-gen list [OPTIONS]

Options¶

--tag TAG          # Filter by tag (repeatable)
--search QUERY     # Search in names/descriptions (case-insensitive)
--keys-only        # Output only prompt keys (one per line)

Note: --api is a global flag and must come before list. For table output, use the list subcommand's own --format table option.

Human-Friendly Output (Default)¶

$ tnh-gen list
Available Prompts (3)

daily - Daily Guidance
  Daily guidance prompt for testing.
  Variables: audience, [location]
  Model: gpt-4o | Tags: guidance, study

translate - Vietnamese-English Translation
  Translate Vietnamese dharma texts to English with context awareness.
  Variables: source_language, target_language, input_text, [context]
  Model: gpt-4o | Tags: translation, dharma

summarize - Summarize Teaching
  Generate concise summary of dharma teaching.
  Variables: input_text, [max_length]
  Model: gpt-4o-mini | Tags: summarization, dharma

Format Notes: - Optional variables shown in brackets: [var_name] - Blank line between prompts for easy scanning - Metadata on single line: Model and Tags

API Output¶

$ tnh-gen --api list
{
  "prompts": [
    {
      "key": "translate",
      "name": "Vietnamese-English Translation",
      "description": "Translate Vietnamese dharma texts to English",
      "tags": ["translation", "dharma"],
      "required_variables": ["source_language", "target_language", "input_text"],
      "optional_variables": ["context"],
      "default_variables": {},
      "default_model": "gpt-4o",
      "output_mode": "text",
      "version": "1.0.0",
      "warnings": []
    }
  ],
  "count": 1,
  "sources": ["defaults+env", "workspace"]
}

Table Format¶

$ tnh-gen list --format table
KEY         NAME                               TAGS                  MODEL
translate   Vietnamese-English Translation     translation, dharma   gpt-4o
summarize   Summarize Teaching                 summarization         gpt-4o-mini

Filtering Examples¶

# Filter by tag
tnh-gen list --tag translation

# Search by keyword
tnh-gen list --search summarize

# Combine filters
tnh-gen list --tag dharma --search translation

# Get just the keys for scripting
tnh-gen list --keys-only
translate
summarize
daily

`tnh-gen run`¶

Execute a prompt with variable substitution and AI processing.

Synopsis¶

tnh-gen run --prompt KEY [OPTIONS]

Required Options¶

--prompt KEY         # Prompt key to execute (from tnh-gen list)
--input-file PATH    # Input file (content auto-injected as input_text)

Variable Passing Options¶

Style 1: JSON file (preferred for complex variables)

--vars PATH          # JSON file with variable definitions

Style 2: Inline parameters (convenient for simple cases)

--var KEY=VALUE      # Variable assignment (repeatable)

Model and Parameter Overrides¶

--model MODEL_NAME       # Override prompt's default model
--intent INTENT          # Routing hint (translation, summarization, etc.)
--max-tokens INT         # Max output tokens
--temperature FLOAT      # Model temperature (0.0-2.0)
--top-p FLOAT            # Nucleus sampling parameter

Output Options¶

--output-file PATH       # Write result to file
--no-provenance          # Omit provenance markers from output

Note: --format and --api are global flags that must come before run.

Variable Precedence¶

Variables are merged in this precedence order (highest to lowest):

Inline --var parameters (highest precedence)
JSON file via --vars
Input file content (auto-injected as input_text) (lowest precedence)

Example:

tnh-gen run --prompt translate \
  --input-file teaching.md \
  --vars base_vars.json \
  --var source_language=vi \
  --var target_language=en

If base_vars.json contains {"source_language": "en"}, the inline --var source_language=vi wins.

Human-Friendly Output (Default)¶

$ tnh-gen run --prompt translate --input-file teaching.md --var source_language=vi --var target_language=en

[Generated translation text appears here without JSON wrapper...]

Behavior: - Only the generated text is printed to stdout - No JSON structure wrapper - Suitable for piping to other commands or files

API Output¶

$ tnh-gen --api run --prompt translate --input-file teaching.md --var source_language=vi --var target_language=en
{
  "status": "succeeded",
  "result": {
    "text": "[Generated translation...]",
    "model": "gpt-4o",
    "usage": {
      "prompt_tokens": 1234,
      "completion_tokens": 567,
      "total_tokens": 1801,
      "estimated_cost_usd": 0.08
    },
    "latency_ms": 3456
  },
  "provenance": {
    "backend": "openai",
    "model": "gpt-4o",
    "prompt_key": "translate",
    "prompt_fingerprint": "sha256:abc123...",
    "prompt_version": "1.0.0",
    "started_at": "2025-12-28T10:30:00Z",
    "completed_at": "2025-12-28T10:30:03Z",
    "schema_version": "1.0"
  },
  "trace_id": "01HQXYZ123ABC"
}

File Output Handling¶

When --output-file is specified:

Write result text to file
Prepend provenance markers (unless --no-provenance)
Use appropriate format (markdown, JSON, etc.)
Print success message to stderr
Print JSON response to stdout (for client parsing)

Provenance Marker Format (YAML frontmatter):

---
tnh_scholar_generated: true
prompt_key: translate
prompt_version: "1.0.0"
model: gpt-4o
fingerprint: sha256:abc123...
trace_id: 01HQXYZ123ABC
generated_at: "2025-12-28T10:30:03Z"
schema_version: "1.0"
---

[Generated content follows...]

Examples¶

# Simple translation with inline variables
tnh-gen run --prompt translate \
  --input-file teaching.md \
  --var source_language=vi \
  --var target_language=en \
  --output-file teaching.en.md

# Complex variables via JSON file
tnh-gen run --prompt summarize \
  --input-file lecture.md \
  --vars config.json \
  --output-file lecture.summary.md

# Override model
tnh-gen run --prompt translate \
  --input-file teaching.md \
  --vars vars.json \
  --model gpt-4o \
  --output-file teaching.translate.md

# JSON output for scripting
tnh-gen --api run --prompt extract_quotes \
  --input-file teaching.md > output.json

`tnh-gen config`¶

Manage configuration settings with hierarchical precedence.

Synopsis¶

tnh-gen config SUBCOMMAND [OPTIONS]

Subcommands¶

tnh-gen config show              # Display current configuration
tnh-gen config get KEY           # Get specific config value
tnh-gen config set KEY VALUE     # Set config value
tnh-gen config list              # List all config keys

Configuration Sources and Precedence¶

Configuration loaded in this order (highest to lowest precedence):

CLI flags (e.g., --model gpt-4o)
Workspace config (.vscode/tnh-scholar.json or local project config)
User config (~/.config/tnh-scholar/config.json)
Environment variables (TNH_GENAI_MODEL, OPENAI_API_KEY, TNH_PROMPT_DIR)
Defaults (defined in GenAI Service and prompt system)

Configuration Schema¶

{
  "prompt_catalog_dir": "/path/to/prompts",
  "default_model": "gpt-4o-mini",
  "max_dollars": 0.30,
  "max_input_chars": 50000,
  "default_temperature": 0.2,
  "provider_api_keys": {
    "openai": "$OPENAI_API_KEY",
    "anthropic": "$ANTHROPIC_API_KEY"
  }
}

Note: API keys can reference environment variables using $VAR_NAME syntax.

Human-Friendly Output (Default)¶

$ tnh-gen config show
prompt_catalog_dir: /custom/path
default_model: gpt-4o-mini
max_dollars: 0.30

Behavior: - YAML format for editability - Shows only user/workspace overrides - Omits defaults and source annotations

API Output¶

$ tnh-gen --api config show
{
  "config": {
    "prompt_catalog_dir": "/custom/path",
    "default_model": "gpt-4o-mini",
    "max_dollars": 0.30,
    "provider_api_keys": {
      "openai": "${OPENAI_API_KEY}",
      "anthropic": "${ANTHROPIC_API_KEY}"
    }
  },
  "sources": ["defaults+env", "user", "workspace"],
  "config_files": [
    "/path/to/workspace/.vscode/tnh-scholar.json",
    "~/.config/tnh-scholar/tnh-gen.json"
  ]
}

Examples¶

# Show all configuration (human-friendly YAML)
tnh-gen config show

# Show all configuration with sources (API mode)
tnh-gen --api config show

# Get specific value
tnh-gen config get default_model
gpt-4o-mini

# Set value (writes to user config)
tnh-gen config set max_dollars 0.30

# Set workspace-level config
tnh-gen config set --workspace prompt_catalog_dir ./tnh-prompts

`tnh-gen version`¶

Display version information for debugging and compatibility verification.

Synopsis¶

tnh-gen version [OPTIONS]

Human-Friendly Output (Default)¶

$ tnh-gen version
tnh-gen 0.2.2 (tnh-scholar 0.2.2)
Python 3.12.4 on darwin

API Output¶

$ tnh-gen --api version
{
  "tnh_scholar": "0.2.2",
  "tnh_gen": "0.2.2",
  "python": "3.12.4",
  "platform": "darwin",
  "prompt_system_version": "1.0.0",
  "genai_service_version": "1.0.0"
}

Pipeline Examples¶

These examples show a simplified OCR journal flow. The current canonical worked example is the rebuilt five-page journal pipeline in tests/golden/journal-pipeline/5page/ with the full reviewed artifact chain under tests/golden/journal-pipeline/walkthrough/clean_translate_5page/. See the Thầy Edited Journal Text Case Study for the fully annotated current workflow.

Recommended Pipeline (section → clean → translate)¶

section_by_break performs the document split using structural breaks in the numbered OCR text — page headers, blank lines, article titles. Each section (roughly one journal page) is then cleaned and translated in a focused, contained call.

# Step 1: Number raw lines (preprocessing)
tnh-lines number source.txt source_numbered.txt

# Step 2: Section — identifies page breaks, produces metadata JSON
tnh-gen run --prompt section_by_break \
  --input-file source_numbered.txt \
  --var source_language=Vietnamese \
  --var target_section_count=4 \
  --var document_metadata='title: ...\nauthor: ...\njournal: ...' \
  --output-file sections.json

# Step 3: Extract section line ranges (see Pipeline Walkthrough for helper script)
# → produces section_1_raw.txt ... section_4_raw.txt

# Step 4: Clean each section (focused per-page context)
for i in 1 2 3 4; do
  tnh-gen run --prompt default_clean_numbered \
    --input-file section_${i}_raw.txt \
    --var source_language=Vietnamese \
    --var publication_name="Phật Giáo Việt Nam" \
    --var publisher_mark="Tư Viện Huệ Quang" \
    --output-file section_${i}_clean.txt
done

# Step 5: Translate each section with full document context
for i in 1 2 3 4; do
  tnh-gen run --prompt default_line_translate \
    --input-file section_${i}_clean.txt \
    --vars sections.json \
    --var source_language=Vietnamese \
    --var target_language=English \
    --var style=scholarly \
    --output-file section_${i}_translated.txt
done

# Step 6: Combine
cat section_{1,2,3,4}_translated.txt > final_translated.txt

What raw OCR looks like before cleaning:

1.―
1-                                ← duplicate section marker
Khuynh hướng Túc mệnh-luận...
...
PHẬT GIÁO VIỆT NAM                ← running footer mid-paragraph

After default_clean_numbered:

10:1. Khuynh hướng Túc mệnh-luận (Pubba kata hetu)
11:Các triết phái thuộc khuynh hướng này...

Footer removed; duplicate marker resolved; line numbers contiguous.

Simpler Alternative (no line tracking)¶

For quick passes where section metadata and line references are not needed:

for page in 7 8 9 10; do
  tnh-gen run --prompt default_clean \
    --input-file source_page_${page}.txt \
    --var source_language=Vietnamese \
    --output-file page_${page}_clean.txt

  tnh-gen run --prompt default_punctuate \
    --input-file page_${page}_clean.txt \
    --var source_language=Vietnamese \
    --output-file page_${page}_punctuated.txt
done

Per-page source files (source_page_7.txt etc.) are pre-extracted and available in tests/golden/journal-pipeline/.

Error Handling¶

Exit Codes¶

Exit Code	Error Type	Description
`0`	Success	Operation completed successfully
`1`	Policy Error	Budget exceeded, size limits, validation failed
`2`	Transport Error	API failure, timeout, network issues
`3`	Provider Error	Model unavailable, rate limit, auth failure
`4`	Format Error	JSON parse failure, schema validation failed
`5`	Input Error	Invalid arguments, missing required variables

Error Output¶

Human-Friendly Error¶

$ tnh-gen run --prompt missing_prompt --input-file test.md
# stdout:
Error: Prompt 'missing_prompt' not found

Suggestion: Run 'tnh-gen list' to see available prompts, or check your prompt key spelling.

# stderr:
[2025-12-28 10:15:23] trace_id=01JGKZ... error_code=PROMPT_NOT_FOUND

API Error¶

$ tnh-gen --api run --prompt missing_prompt --input-file test.md
# stdout:
{
  "status": "failed",
  "error": "Prompt 'missing_prompt' not found",
  "diagnostics": {
    "error_type": "PromptNotFoundError",
    "error_code": "PROMPT_NOT_FOUND",
    "suggestion": "Run 'tnh-gen list' to see available prompts"
  },
  "trace_id": "01JGKZ..."
}

# stderr:
[2025-12-28 10:15:23] trace_id=01JGKZ... error_code=PROMPT_NOT_FOUND

Trace ID: Use the trace ID from stderr to correlate with logs or support requests. Set TNH_TRACE_ID environment variable to override auto-generation.

Environment Variables¶

TNH_PROMPT_DIR         # Path to prompt catalog directory
OPENAI_API_KEY         # OpenAI API key (required for OpenAI models)
ANTHROPIC_API_KEY      # Anthropic API key (required for Claude models)
TNH_GENAI_MODEL        # Default model to use
TNH_TRACE_ID           # Override auto-generated trace ID
TNH_CONFIG_PATH        # Override config file location

Architecture¶

tnh-gen follows object-service architecture patterns and integrates with:

Prompt System - Discovery and rendering via PromptsAdapter
GenAI Service - Model execution and provenance tracking
AI Text Processing - Refactored text processing pipeline
Configuration System - Hierarchical settings management

For architectural details, see: - ADR-TG01: CLI Architecture - ADR-TG01.1: Human-Friendly Defaults - ADR-TG02: Prompt System Integration - TNH-Gen Architecture Overview

Migration from tnh-fab¶

The tnh-gen CLI supersedes the legacy tnh-fab tool. Key differences:

Aspect	tnh-fab (Legacy)	tnh-gen (Current)
Architecture	Monolithic, mixed concerns	Object-service compliant
Output Mode	JSON-first	Human-friendly by default
Prompt System	Legacy patterns	New prompt catalog
Configuration	Ad-hoc, `TNH_PROMPT_DIR`	Hierarchical, `TNH_PROMPT_DIR`
VS Code Support	None	First-class with `--api` flag

Migration Steps:

Replace tnh-fab run <pattern> with tnh-gen run --prompt <pattern>
Update environment variable to TNH_PROMPT_DIR
Update scripts to use --api flag for JSON output
Review configuration files for new schema

For complete migration guide, see TNH-Gen Architecture.

tnh-gen¶

Overview¶

Installation¶

Quick Start¶

Global Flags¶

Output Modes¶

Commands¶

tnh-gen list¶

Synopsis¶

Options¶

Human-Friendly Output (Default)¶

API Output¶

Table Format¶

Filtering Examples¶

tnh-gen run¶

Synopsis¶

Required Options¶

Variable Passing Options¶

Model and Parameter Overrides¶

Output Options¶

Variable Precedence¶

Human-Friendly Output (Default)¶

API Output¶

File Output Handling¶

Examples¶

tnh-gen config¶

Synopsis¶

Subcommands¶

Configuration Sources and Precedence¶

Configuration Schema¶

Human-Friendly Output (Default)¶

API Output¶

Examples¶

tnh-gen version¶

Synopsis¶

Human-Friendly Output (Default)¶

API Output¶

Pipeline Examples¶

Recommended Pipeline (section → clean → translate)¶

Simpler Alternative (no line tracking)¶

Error Handling¶

Exit Codes¶

Error Output¶

Human-Friendly Error¶

API Error¶

Environment Variables¶

Architecture¶

Migration from tnh-fab¶

See Also¶

`tnh-gen list`¶

`tnh-gen run`¶

`tnh-gen config`¶

`tnh-gen version`¶