ADR-TR03: Standardizing Timestamps to Milliseconds¶
Normalizes every transcription provider to emit integer millisecond timestamps for precision and consistency.
- Status: Proposed
- Date: 2025-05-01
Context¶
Our transcription service currently uses a mix of timestamp formats across different providers:
- AssemblyAI returns timestamps in milliseconds, which we then convert to seconds
- OpenAI Whisper returns timestamps in seconds
This inconsistency creates several issues:
- We need conversion logic in the AssemblyAI adapter
- We use floating-point numbers to represent seconds, potentially leading to precision issues
- The inconsistent API contract creates potential confusion for consumers of the service
- The mixed format approach is more error-prone and less maintainable
Decision Drivers¶
- Precision Requirements: Accurate timestamps are critical for subtitle synchronization, speaker diarization, and audio analysis
- Data Type Consistency: Integer representation is more reliable than floating-point for precise timing
- Provider API Formats: Different providers use different time unit standards
- Downstream Compatibility: Impact on existing consumers of the transcription service
- Performance Considerations: Integer operations are typically faster than floating-point operations
Proposed Decision¶
We will standardize all timestamps in the transcription service to use milliseconds as the base unit, represented as integers, for the following reasons:
- Milliseconds provide sufficient precision for all anticipated use cases
- Integer representation avoids floating-point precision errors
- AssemblyAI already uses milliseconds, reducing conversion needs
- Millisecond precision is the common standard in media processing applications
Design Impacts¶
Interface Changes¶
The TranscriptionService interface will be updated to specify that all timestamps should be in milliseconds:
def transcribe(self, audio_file, options) -> Dict[str, Any]:
"""
...
Returns:
Dictionary containing transcription results with standardized keys:
- ...
- words: List of words with timing information (timestamps in milliseconds)
- utterances: List of utterances by speaker (timestamps in milliseconds)
- audio_duration: Duration of audio in milliseconds
- ...
"""
Service Implementation Changes¶
- AssemblyAITranscriptionService:
- Remove timestamp conversions from milliseconds to seconds
- Update documentation to reflect millisecond timestamps
-
Keep raw API response format unchanged (already uses milliseconds)
-
WhisperTranscriptionService:
- Convert timestamps from seconds to milliseconds
- Update documentation to reflect millisecond timestamps
-
For API responses in seconds, multiply values by 1000 and convert to integers
-
Format Converter:
- Update handling of timestamps in format conversion logic
- Ensure SRT and VTT generation use correct millisecond timing
Standardized Timestamp Format¶
All timestamp-related fields will follow this standard:
- Use integer values representing milliseconds
- Use consistent field names across all services (
start_ms,end_ms,duration_ms) - Always include timestamp units in field names for clarity
Consequences¶
Positive¶
- Consistent Data Type: Integer timestamps avoid floating-point precision issues
- Reduced Conversion Logic: Less conversion code in the AssemblyAI adapter
- Higher Precision: Millisecond precision is suitable for all anticipated use cases
- Clearer API Contract: Standardized format creates consistency for consumers
- Future Compatibility: Easier integration with additional transcription services
- Alignment with Industry Standards: Media processing typically uses milliseconds
Negative¶
- Breaking Changes: Existing code that consumes timestamps in seconds will need updates
- Increased Values: Millisecond values are 1000x larger than second values
- Initial Complexity: Additional adapter code needed during transition
Neutral¶
- Storage Impact: Integer milliseconds may use slightly more memory than floating-point seconds, but the difference is negligible
- Computational Impact: Integer operations are typically faster than floating-point, potentially offsetting the larger values
Implementation Plan¶
We will follow a phased approach to minimize disruption:
Phase 1: Interface Documentation (2 days)¶
- Update the interface documentation to specify millisecond timestamps
- Add deprecation notices for second-based timestamps
- Document the migration path for consumers
Phase 2: Service Implementation (1 week)¶
- Update the
AssemblyAITranscriptionServiceto remove conversions to seconds - Update the
WhisperTranscriptionServiceto output milliseconds - Update the format converter timestamp handling
- Add comprehensive tests for timestamp handling
Phase 3: Compatibility Layer (1 week)¶
- Add helper methods for clients to convert between timestamp formats
- Create adapter functions to maintain backward compatibility where needed
- Provide utility functions for common timestamp operations
Phase 4: Full Migration (2 weeks)¶
- Update all internal consumers to use millisecond timestamps
- Remove compatibility helpers for second-based timestamps
- Finalize documentation and test coverage
Alternatives Considered¶
1. Standardize on Seconds¶
Pros: - More human-readable values - Smaller numeric values - Already implemented in Whisper service
Cons: - Requires conversion from AssemblyAI's native milliseconds - Floating-point precision issues - Less common in media processing systems
2. Nanosecond Precision¶
Pros: - Higher precision than milliseconds - Future-proof for high-precision applications
Cons: - Unnecessary precision for current use cases - Very large integer values - Not supported by our current providers natively
3. Maintain Mixed Format¶
Pros: - No immediate changes needed - Each provider uses its native format
Cons: - Inconsistent API contract - More complex error handling - Higher cognitive load for developers