Skip to content

TextObject Original Design

Legacy TextObject design notes capturing the original sectioning models, metadata strategy, and validation approach.

Overview

TextObject provides structured text handling with explicit section boundaries and metadata support, built on Pydantic models for validation.

Core Components

LogicalSection

class LogicalSection(BaseModel):
    title: str
    start_line: int
    end_line: int

TextObject

Primary container managing:

  • Section boundaries
  • Language information
  • Dublin Core metadata
  • NumberedText content

Key Design Points

Section Management

  • Explicit start and end lines for sections
  • Validation ensures no overlaps or gaps
  • Ordered storage by line number

Content Integration

  • Immutable NumberedText reference
  • Line number integrity maintained
  • Section boundaries must match content

Metadata

  • Dublin Core based structure
  • Optional/required field handling
  • Flexible additional metadata support

Implementation Notes

Validation Requirements

  • Sections must not overlap
  • No gaps between sections allowed
  • Line numbers must exist in content
  • Language codes accepted but not validated

API Considerations

  • Pydantic models ensure clean serialization
  • Clear validation errors
  • Type safety throughout