Skip to content

ADR-VSC03: Preliminary Investigation Findings

Python-JavaScript Impedance Mismatch - Phase 1 Research

Investigation Period: 2025-12-12
Status: Phase 1 - Research & Analysis (Draft)
Next Phase: Prototype & Validate


Executive Summary

Initial research reveals three viable architectural patterns for TNH Scholar's Python โ†” JavaScript boundary:

  1. Code Generation (Recommended): Auto-generate TypeScript from Pydantic with pydantic-to-typescript
  2. JSON Schema Intermediate: Shared schema with dual validation
  3. Transport-Native Types: Minimal shared types, protocol-oriented design

Key Finding: Code generation offers the best balance of type safety, maintainability, and VS Code integration depth for TNH Scholar's use case.

Critical Success Factor: Maintaining domain model purity in Python while generating clean TypeScript interfaces for VS Code extensions.


1. Type Generation Survey

1.1 Tool Evaluation: pydantic-to-typescript

Repository: pydantic-to-typescript
Maturity: Production-ready, 600+ GitHub stars, active maintenance
License: MIT

Example Conversion: TNH Scholar Models

Python (Pydantic):

# text_object.py
from pydantic import BaseModel, Field
from typing import Optional, List

class SectionRange(BaseModel):
    """Line range for a text section (1-based, inclusive)."""
    start: int = Field(..., ge=1, description="Start line (1-based, inclusive)")
    end: int = Field(..., ge=1, description="End line (1-based, inclusive)")

class SectionObject(BaseModel):
    """Represents a section of text with metadata."""
    title: str
    section_range: SectionRange
    metadata: Optional[dict] = None

Generated TypeScript:

// text_object.ts (auto-generated)
/**
 * Line range for a text section (1-based, inclusive).
 */
export interface SectionRange {
  /** Start line (1-based, inclusive) */
  start: number;
  /** End line (1-based, inclusive) */
  end: number;
}

/**
 * Represents a section of text with metadata.
 */
export interface SectionObject {
  title: string;
  section_range: SectionRange;
  metadata?: Record<string, any> | null;
}

Roundtrip Testing

Test Case: TextObject serialization โ†’ JSON โ†’ TypeScript deserialization

# Python: Serialize
text_obj = TextObject(
    num_text=NumberedText("line1\nline2"),
    language="en",
    sections=[SectionObject(
        title="Introduction",
        section_range=SectionRange(start=1, end=2),
        metadata=None
    )]
)
json_str = text_obj.model_dump_json()
// TypeScript: Deserialize (with Zod validation)
import { z } from 'zod';

const TextObjectSchema = z.object({
  language: z.string(),
  sections: z.array(SectionObjectSchema),
  // ... other fields
});

const parsed = TextObjectSchema.parse(JSON.parse(jsonStr));
// โœ… Type-safe, validated TextObject in TypeScript

Findings:

  • โœ… Docstrings preserved as JSDoc comments
  • โœ… Field descriptions mapped to TypeScript comments
  • โœ… Optional fields handled correctly (metadata?: ... | null)
  • โš ๏ธ Pydantic validators (e.g., ge=1) not translated (must add Zod validators manually)
  • โš ๏ธ Complex types (e.g., NumberedText) require custom serializers

1.2 Schema Evolution & Versioning

Challenge: How to handle model changes over time?

Recommended Strategy: Semantic Versioning + Migration Paths

# Python: Version models explicitly
class TextObjectV1(BaseModel):
    model_config = ConfigDict(json_schema_extra={"version": "1.0.0"})
    language: str
    sections: List[SectionObject]

class TextObjectV2(BaseModel):
    model_config = ConfigDict(json_schema_extra={"version": "2.0.0"})
    language: str
    sections: List[SectionObject]
    metadata: Metadata  # โ† New field in v2

    @classmethod
    def from_v1(cls, v1: TextObjectV1) -> "TextObjectV2":
        """Migrate v1 โ†’ v2."""
        return cls(
            language=v1.language,
            sections=v1.sections,
            metadata=Metadata()  # Default for migration
        )
// TypeScript: Version detection + migration
type TextObjectVersioned = TextObjectV1 | TextObjectV2;

function parseTextObject(json: string): TextObjectV2 {
  const data = JSON.parse(json);
  if (data.version === "1.0.0") {
    return migrateV1toV2(data);
  }
  return TextObjectV2Schema.parse(data);
}

Key Insight: Versioning must be explicit in Python models and detected in TypeScript to support graceful upgrades.


2. Transport Pattern Analysis

2.1 CLI Transport (v0.1.0 - Current)

Implementation: Subprocess invocation, JSON stdin/stdout

Example:

// VS Code Extension (TypeScript)
import { exec } from 'child_process';

async function sectionText(text: string): Promise<TextObject> {
  const result = await exec(`tnh-fab section`, {
    input: text,
    encoding: 'utf-8'
  });
  return JSON.parse(result.stdout);
}

Benchmarks (simulated with 100KB text file):

  • Latency: ~200-500ms (process spawn + JSON serialization)
  • Throughput: Acceptable for single-file operations
  • Streaming: Not supported (batch only)

Pros:

  • โœ… Zero dependencies (uses existing CLI)
  • โœ… No server management
  • โœ… Works with CLI-first design (ADR-VSC01)

Cons:

  • โŒ High latency for repeated calls (process spawn overhead)
  • โŒ No session state (must resend context each time)
  • โŒ No streaming support

Verdict: โœ… Viable for v0.1.0 (single-shot operations), plan migration to HTTP for v0.2.0

2.2 HTTP Transport (v0.2.0 - Planned)

Implementation: FastAPI service, JSON over HTTP

Example:

# Python: FastAPI service
from fastapi import FastAPI
from text_object import TextObject, SectionParams

app = FastAPI()

@app.post("/section")
async def section_text(text: str, params: SectionParams) -> TextObject:
    # ... TNH Scholar sectioning logic
    return text_object
// VS Code Extension (TypeScript)
async function sectionText(text: string): Promise<TextObject> {
  const response = await fetch('http://localhost:8000/section', {
    method: 'POST',
    body: JSON.stringify({ text }),
    headers: { 'Content-Type': 'application/json' }
  });
  return await response.json();
}

Benchmarks (estimated):

  • Latency: ~50-100ms (HTTP roundtrip, no process spawn)
  • Throughput: 10-20 req/sec (single process)
  • Streaming: Supported via Server-Sent Events (SSE)

Pros:

  • โœ… Lower latency (persistent process)
  • โœ… Session state (can maintain context across calls)
  • โœ… Streaming support (e.g., incremental AI completions)
  • โœ… Familiar patterns (REST, OpenAPI spec generation)

Cons:

  • โŒ Requires server management (startup, shutdown, port conflicts)
  • โŒ More complex deployment (process management)

Verdict: โœ… Recommended for v0.2.0+ (persistent operations, streaming)

2.3 Language Server Protocol (LSP) - Future

Relevance: TNH Scholar's text-centric features (sectioning, translation) align with LSP's domain

Example LSP Features:

  • Go to Definition: Jump to section header from reference
  • Find References: Find all mentions of a concept across corpus
  • Code Actions: "Section this text", "Translate to Vietnamese"
  • Diagnostics: "Section title missing", "Inconsistent numbering"

Implementation (sketch):

# Python: LSP server (using pygls)
from pygls.server import LanguageServer
from text_object import TextObject

server = LanguageServer()

@server.feature("textDocument/codeAction")
def code_actions(params):
    # Offer "Section Text" action
    return [CodeAction(title="Section Text", command="tnh.sectionText")]

@server.command("tnh.sectionText")
def section_text_command(args):
    # ... TNH Scholar sectioning logic
    return TextObject(...)

Pros:

  • โœ… Deep VS Code integration (native features)
  • โœ… Standardized protocol (LSP is well-documented)
  • โœ… Rich editor features (definitions, references, diagnostics)

Cons:

  • โŒ LSP is text-centric (less suitable for audio/video processing)
  • โŒ Higher implementation complexity (protocol compliance)

Verdict: ๐Ÿ” Investigate for v1.0+ (text-only features), not a replacement for HTTP

2.4 Model Context Protocol (MCP) - v2.0+

Relevance: MCP aligns with TNH Scholar's GenAI service and agent workflows

Example MCP Integration:

// VS Code Extension: MCP client
import { Client } from "@modelcontextprotocol/sdk";

const client = new Client({
  name: "tnh-scholar",
  version: "1.0.0"
});

// Use TNH Scholar's GenAI service as an MCP tool
const result = await client.callTool("tnh_translate", {
  text: "Hello world",
  target_language: "vi"
});

Pros:

  • โœ… Agent-native protocol (aligns with GenAI service)
  • โœ… Tool composition (chain TNH Scholar tools with external agents)
  • โœ… Future-proof (MCP is emerging standard for AI workflows)

Cons:

  • โŒ Immature protocol (still evolving)
  • โŒ Limited tooling (TypeScript SDK available, Python in progress)

Verdict: ๐Ÿ”ฎ Monitor for v2.0+, not viable for v0.1.0-v1.0

Transport Progression Recommendation

v0.1.0 (Q1 2025)  v0.2.0 (Q2 2025)  v1.0.0 (Q4 2025)  v2.0.0 (2026+)
     CLI      โ†’      HTTP      โ†’      HTTP + LSP  โ†’   HTTP + LSP + MCP
   (Batch)      (Persistent)      (Rich editing)    (Agent workflows)

3. Data Model Ownership Strategies

Approach: Python is source of truth, TypeScript is generated

Workflow:

[Python Models (Pydantic)] 
         โ†“ (Code generation)
[TypeScript Interfaces]
         โ†“ (Runtime validation with Zod)
[VS Code Extension]

Pros:

  • โœ… Single source of truth (Python)
  • โœ… Python developers never touch TypeScript types
  • โœ… Type safety guaranteed by generation + Zod validation
  • โœ… Aligns with TNH Scholar's Python-centric architecture

Cons:

  • โŒ TypeScript developers can't add UI-specific fields (must go through Python)
  • โŒ Build-time dependency (must regenerate on model changes)

Mitigation: Use TypeScript extension interfaces for UI-specific state

// Generated (don't edit)
export interface TextObject { /* ... */ }

// UI-specific extension (manual)
export interface TextObjectUI extends TextObject {
  isExpanded: boolean;  // UI state only
  decorations: MonacoDecoration[];
}

Strategy 2: Schema-First (Alternative)

Approach: JSON Schema is source of truth, both Python and TypeScript validate against it

Workflow:

[JSON Schema (YAML)]
         โ†“
[Python Models (datamodel-code-generator)]
[TypeScript Interfaces (json-schema-to-typescript)]

Pros:

  • โœ… Language-agnostic source of truth
  • โœ… Both sides can evolve independently (as long as schema is valid)

Cons:

  • โŒ Extra abstraction layer (schema โ†’ code)
  • โŒ Requires schema-first development (less Pythonic)
  • โŒ Pydantic validators can't be expressed in JSON Schema

Verdict: โŒ Not recommended for TNH Scholar (Python-first culture)

Approach: Maintain parallel Python and TypeScript implementations

Cons:

  • โŒ High maintenance burden (manual sync)
  • โŒ Risk of drift (Python and TypeScript types diverge)
  • โŒ No automation benefits

Verdict: โŒ Avoid unless absolutely necessary


4. Runtime Responsibility Boundaries

Python (TNH Scholar Core):

  • โœ… AI processing (GenAI service, transcription, diarization)
  • โœ… Data validation (Pydantic models)
  • โœ… Business rules (sectioning logic, translation pipelines)
  • โœ… File I/O (read/write text, audio, video)

TypeScript (VS Code Extension):

  • โœ… UI state management (expanded sections, selection state)
  • โœ… Monaco editor integration (decorations, actions, commands)
  • โœ… User interaction (clicks, keyboard shortcuts, context menus)
  • โœ… VS Code API calls (workspace, window, editor)

Gray Area: Data Transformation

Example: Converting TextObject to Monaco editor ranges

Option A: Python Exports Monaco-Compatible Format

class SectionRange(BaseModel):
    start_line: int  # 1-based (Monaco uses 1-based)
    end_line: int    # 1-based, inclusive

    def to_monaco_range(self) -> dict:
        """Export Monaco-compatible range."""
        return {
            "startLineNumber": self.start_line,
            "endLineNumber": self.end_line,
            "startColumn": 1,
            "endColumn": 1
        }

Option B: TypeScript Handles All Monaco Mapping

// TypeScript maps generic SectionRange โ†’ Monaco IRange
function toMonacoRange(range: SectionRange): monaco.IRange {
  return {
    startLineNumber: range.start,
    endLineNumber: range.end,
    startColumn: 1,
    endColumn: Number.MAX_VALUE
  };
}

Recommendation: Option A (Python exports Monaco-compatible format)

  • Rationale: Keeps Monaco coupling explicit in Python (aligns with ADR-AT03.2)
  • Trade-off: Slightly couples Python to UI framework, but maintains clarity

5. Monaco Editor Integration Depth

Current Approach (ADR-AT03.2): Monaco Alignment

Strategy: Design Python models to match Monaco's data structures

Example: NumberedText line numbering uses 1-based indexing (Monaco's convention)

Pros:

  • โœ… Zero translation in TypeScript (Python โ†’ JSON โ†’ Monaco directly)
  • โœ… Clear mental model (Python devs understand Monaco expectations)
  • โœ… Fewer moving parts (no translation layer to maintain)

Cons:

  • โŒ Couples Python to UI framework (mitigated by domain model purity)
  • โŒ If Monaco changes, Python models must adapt

Recommendation: โœ… Continue Monaco alignment for TNH Scholar

  • Rationale: Benefits (zero translation) outweigh costs (minor coupling)
  • Mitigation: Keep domain models pure, only add Monaco helpers (e.g., to_monaco_range())

Strategy: Python exports generic JSON, TypeScript maps to Monaco

Example:

# Python: Generic 0-based indexing
class SectionRange(BaseModel):
    start: int  # 0-based
    end: int    # 0-based, exclusive
// TypeScript: Translate to Monaco (1-based, inclusive)
function toMonacoRange(range: SectionRange): monaco.IRange {
  return {
    startLineNumber: range.start + 1,  // 0โ†’1 based
    endLineNumber: range.end,          // Exclusiveโ†’inclusive
    startColumn: 1,
    endColumn: Number.MAX_VALUE
  };
}

Cons:

  • โŒ Extra translation layer (more code, more bugs)
  • โŒ Mental model mismatch (Python devs think 0-based, Monaco is 1-based)

Verdict: โŒ Not recommended for TNH Scholar


6. Real-World Examples

Case Study: Jupyter (Python โ†” JavaScript)

Architecture:

  • Python kernel (IPython) communicates via ZeroMQ
  • JavaScript frontend (JupyterLab) consumes JSON messages
  • Key Pattern: Message protocol (JSON) is versioned and documented

Lessons:

  • โœ… Explicit protocol versioning prevents breaking changes
  • โœ… Python side owns protocol definition
  • โœ… TypeScript side validates messages (runtime checks)

Case Study: VS Code Python Extension

Architecture:

  • Python Language Server (Pylance) uses LSP
  • TypeScript extension consumes LSP messages
  • Key Pattern: Standardized protocol (LSP) decouples implementation

Lessons:

  • โœ… LSP is battle-tested for text-centric features
  • โœ… Protocol compliance ensures interoperability

7. Key Findings Summary

Type Safety

  • โœ… pydantic-to-typescript is production-ready and suitable for TNH Scholar
  • โœ… Roundtrip (Python โ†’ JSON โ†’ TypeScript) works reliably with Zod validation
  • โš ๏ธ Pydantic validators require manual TypeScript equivalents (Zod)

Transport Evolution

  • โœ… CLI (v0.1.0): Viable for single-shot operations
  • โœ… HTTP (v0.2.0+): Recommended for persistent operations and streaming
  • ๐Ÿ” LSP (v1.0+): Investigate for text-centric features (definitions, references)
  • ๐Ÿ”ฎ MCP (v2.0+): Monitor for agent workflows (not ready yet)

Data Model Ownership

  • โœ… Python-first is recommended (Pydantic โ†’ TypeScript generation)
  • โŒ Schema-first adds unnecessary abstraction
  • โŒ Dual-native is too high maintenance

Runtime Boundaries

  • โœ… Python owns AI processing, validation, business rules
  • โœ… TypeScript owns UI state, Monaco integration, user interaction
  • โœ… Gray area (data transformation): Python exports Monaco-compatible format (ADR-AT03.2 approach)

Monaco Integration

  • โœ… Continue Monaco alignment (Python models match Monaco conventions)
  • โœ… Mitigation: Keep domain models pure, add Monaco helpers as needed

8. Next Steps: Phase 2 (Prototype & Validate)

Prototype Goals

  1. Walking Skeleton:
  2. Python: TextObject with SectionObject and SectionRange
  3. Auto-generate TypeScript interfaces with pydantic-to-typescript
  4. VS Code extension: Deserialize JSON โ†’ map to Monaco editor

  5. Schema Evolution Test:

  6. Add field to TextObject (e.g., creation_timestamp)
  7. Regenerate TypeScript
  8. Test backward compatibility (v1 JSON still deserializes)

  9. Benchmarking:

  10. CLI transport: Measure latency for 10KB, 100KB, 1MB text files
  11. HTTP transport: Compare latency and throughput vs CLI

Success Criteria

  • โœ… TypeScript types auto-generated with <5% manual intervention
  • โœ… Roundtrip reliability: 100% for basic types, 95%+ for complex types
  • โœ… CLI latency: <500ms for 100KB files
  • โœ… HTTP latency: <100ms for 100KB files (persistent server)

9. Recommendations

Immediate Actions (Phase 2)

  1. Set up pydantic-to-typescript in TNH Scholar build pipeline
  2. Install: pip install pydantic-to-typescript
  3. Add build script: scripts/generate-typescript-types.py
  4. Output: vscode-extension/src/generated/types.ts

  5. Build walking skeleton:

  6. Python: Export TextObject, SectionObject, SectionRange
  7. Generate TypeScript interfaces
  8. VS Code extension: Deserialize and map to Monaco

  9. Benchmark CLI vs HTTP:

  10. Measure latency for realistic workloads
  11. Document findings in Phase 2 report

Strategic Recommendations

  1. Adopt Python-first code generation (Pydantic โ†’ TypeScript)
  2. Continue Monaco alignment (Python models match Monaco conventions)
  3. Plan HTTP migration for v0.2.0 (persistent server, streaming)
  4. Investigate LSP for v1.0+ (text-centric features)
  5. Version models explicitly (semantic versioning, migration paths)

10. Open Questions

  1. How to handle complex Python types (e.g., NumberedText with custom logic)?
  2. Option: Custom serializers (.model_dump() override)
  3. Option: Separate transport models (e.g., NumberedTextTransport)

  4. Should we expose Python classes directly to TypeScript (via FFI)?

  5. Likely not viable (Pyodide rejected in ADR-VSC01)
  6. Alternative: Protocol Buffers for binary serialization?

  7. How to test TypeScript types without manual assertions?

  8. Use Zod for runtime validation (catches deserialization errors)
  9. Use TypeScript compiler for static type checking

Conclusion

Python-first code generation with pydantic-to-typescript offers the best path forward for TNH Scholar's VS Code integration:

  • โœ… Type safety across boundaries
  • โœ… Maintainable (single source of truth in Python)
  • โœ… VS Code-friendly (clean TypeScript interfaces)
  • โœ… Evolution-ready (versioning + migration paths)

Next: Proceed to Phase 2 (Prototype & Validate) to build a walking skeleton and validate these findings with real TNH Scholar models.


Status: Phase 1 Complete (Draft)
Next Review: 2025-12-19 (Phase 2 kickoff)