ADR-TR02: Optimized SRT Generation Design¶
Leverages provider-native subtitle generation so SRT output no longer requires a bespoke converter.
- Status: Proposed
- Date: 2025-05-01
Context¶
After researching the capabilities of both OpenAI Whisper and AssemblyAI transcription services, we discovered that they both offer direct SRT generation capabilities. Our current design included a separate TranscriptionFormatConverter utility for format conversion, but we can streamline the implementation by utilizing these native capabilities.
Decision Drivers¶
- Direct support for SRT generation in both API services
- Performance considerations when generating subtitles
- Consistency of outputs across providers
- Maintenance complexity
- API efficiency and cost optimization
Proposed Decision¶
We will optimize the SRT generation by:
- Using provider-specific direct SRT output capabilities when available
- Overriding the
transcribe_to_format()method in each provider implementation to take advantage of native format generation - Maintaining the fallback to the general
TranscriptionFormatConverterfor scenarios where direct generation isn't available - Simplifying the implementation to reduce unnecessary code paths
Design Details¶
Service Interface Updates¶
The TranscriptionService base class will remain mostly unchanged, with the existing methods:
def transcribe(self, audio_file, options) -> Dict[str, Any]
def get_result(self, job_id) -> Dict[str, Any]
def export_format(self, result, format_type, options) -> str
def transcribe_to_format(self, audio_file, format_type, transcription_options, format_options) -> str
Provider-Specific Optimizations¶
OpenAI Whisper¶
For Whisper, we'll use the API's native response_format parameter:
def transcribe_to_format(self, audio_file, format_type="srt", transcription_options=None, format_options=None):
if format_type in ["srt", "vtt"]:
options = transcription_options or {}
options["response_format"] = format_type
result = self.transcribe(audio_file, options)
if "subtitle_content" in result:
return result["subtitle_content"]
# Fallback to raw result if available
if isinstance(result["raw_result"], str):
return result["raw_result"]
# Fallback to general converter
return super().transcribe_to_format(...)
AssemblyAI¶
For AssemblyAI, we'll use the dedicated subtitles endpoint:
def transcribe_to_format(self, audio_file, format_type="srt", transcription_options=None, format_options=None):
if format_type in self.SUBTITLE_FORMATS:
audio_url = self.upload_file(audio_file)
transcript_id = self.start_transcription(audio_url, transcription_options)
self.poll_for_completion(transcript_id)
return self.get_subtitles(transcript_id, format_type, format_options)
# Fallback to general converter
return super().transcribe_to_format(...)
Format Converter Simplification¶
The TranscriptionFormatConverter will still be available as a fallback, but we'll deprioritize its implementation since most SRT generation will use the native capabilities. We'll focus on the most essential format conversion features:
- Basic text-only output
- Simple SRT generation for cases where native support is unavailable
- Minimal formatting options
Consequences¶
Advantages¶
- Performance: Direct API generation of SRT files will be more efficient
- Quality: Native SRT generation will produce higher quality subtitles with proper timing
- Maintenance: Less code to maintain in the format converter
- Cost: Potentially lower API costs by avoiding redundant processing
- Accuracy: Provider-specific optimizations will better handle edge cases
Disadvantages¶
- Provider Coupling: Tighter coupling to provider-specific API capabilities
- Testing Complexity: Need to test both direct generation and fallback paths
- Configuration Management: More provider-specific options to document and manage
Implementation Plan¶
- Update the Whisper service implementation to use the
response_formatparameter - Enhance the AssemblyAI implementation to use the subtitles endpoint
- Simplify the
TranscriptionFormatConverterto focus on fallback scenarios - Update the integration tests to verify both direct and fallback paths
- Update documentation to reflect the optimized approach
References¶
- OpenAI Whisper API response_format parameter
- AssemblyAI Subtitles endpoint
- Original SRT Generation ADR