ADR-TR04: AssemblyAI Service Implementation Improvements¶
Modernizes the AssemblyAI integration by adopting the official SDK, exposing advanced options, and hardening error handling.
- Status: Proposed
- Date: 2025-05-01
Context¶
The current implementation of assemblyai_service.py provides basic functionality but fails to leverage many powerful features of the AssemblyAI API. After a thorough review of the API documentation and capabilities, we've identified several limitations in the current implementation:
- Limited Configuration Options: The implementation hardcodes many options, including defaulting to English language.
- Missing Advanced Features: No support for many audio intelligence features like sentiment analysis, entity detection, etc.
- Inefficient Implementation: Current implementation doesn't use the official SDK and contains redundant code.
- Lack of Error Handling: Minimal error handling for common failure scenarios.
- Missing Regional Support: No support for EU region endpoints for data residency compliance.
- Hardcoded Paths: API endpoints and configuration options are hardcoded.
- Incomplete Documentation: Documentation doesn't cover all available options.
Decision Drivers¶
- API Feature Coverage: Desire to expose as much of AssemblyAI's feature set as possible through our service layer.
- Maintainability: Reduce code complexity and leverage the official SDK.
- Type Safety: Improve type annotations for better IDE support and error checking.
- Flexibility: Enable configuration of all major API parameters.
- Error Recovery: Improve handling of common failure scenarios.
- Performance: Optimize API interactions.
- Developer Experience: Make the service intuitive and well-documented.
Proposed Decision¶
We will significantly refactor the AssemblyAI service implementation to address these limitations using the following approaches:
- Use Official SDK: Replace low-level HTTP calls with the official AssemblyAI Python SDK where appropriate.
- Enhanced Configuration Model: Implement a comprehensive configuration model that exposes all major API options.
- Complete Feature Coverage: Add support for all major AssemblyAI features.
- Improved Error Handling: Implement better error handling with specific exception types.
- Regional Support: Add EU region support with configuration options.
- Callback Improvements: Add support for all callback patterns.
Design Details¶
1. Configuration Model¶
Create a nested configuration model that exposes all major API options:
@dataclass
class AssemblyAIConfig:
"""Configuration for AssemblyAI service."""
# Base configuration
api_key: Optional[str] = None
base_url: str = "https://api.assemblyai.com/v2" # or EU endpoint
# Connection configuration
polling_interval: int = 3 # seconds
max_polling_time: int = 300 # seconds
# Core transcription configuration
model: Literal["best", "nano"] = "best"
language_code: Optional[str] = None
language_detection: bool = False
language_detection_confidence_threshold: float = 0.5
# Text formatting options
format_text: bool = True
punctuate: bool = True
disfluencies: bool = False
filter_profanity: bool = False
# Speaker options
speaker_labels: bool = False
speakers_expected: Optional[int] = None
# Audio channel options
multichannel: bool = False
# Accuracy enhancement options
word_boost: Optional[List[str]] = None
boost_param: Optional[Literal["low", "default", "high"]] = None
custom_spelling: Optional[Dict[str, str]] = None
# Audio intelligence configuration
auto_chapters: bool = False
auto_highlights: bool = False
entity_detection: bool = False
iab_categories: bool = False
sentiment_analysis: bool = False
summarization: bool = False
# Callback options
webhook_url: Optional[str] = None
webhook_auth_header_name: Optional[str] = None
webhook_auth_header_value: Optional[str] = None
2. Service Interface Update¶
Update the service interface to use the configuration model and support all major features:
class AssemblyAITranscriptionService(TranscriptionService):
"""Enhanced AssemblyAI implementation of the TranscriptionService interface."""
def __init__(self, config: Optional[AssemblyAIConfig] = None):
"""Initialize with comprehensive configuration options."""
self.config = config or AssemblyAIConfig()
self._initialize_sdk()
def _initialize_sdk(self) -> None:
"""Initialize the AssemblyAI SDK with current configuration."""
import assemblyai as aai
# Set API key and base URL
aai.settings.api_key = self.config.api_key or os.getenv("ASSEMBLYAI_API_KEY")
if not aai.settings.api_key:
raise ValueError("AssemblyAI API key is required")
# Set base URL for regional support
if "eu" in self.config.base_url:
aai.settings.base_url = "https://api.eu.assemblyai.com/v2"
# Initialize transcriber
self.transcriber = aai.Transcriber()
def transcribe(
self,
audio_file: Union[Path, BinaryIO],
options: Optional[Dict[str, Any]] = None
) -> Dict[str, Any]:
"""
Transcribe audio with full feature support.
Args:
audio_file: Path or file-like object
options: Override configuration options
Returns:
Standardized transcription result
"""
# Create transcription config by merging base config with options
import assemblyai as aai
# Prepare configuration
tx_config = self._prepare_transcription_config(options)
# Handle different input types
file_path = self._get_file_path(audio_file)
# Perform transcription
transcript = self.transcriber.transcribe(file_path, config=tx_config)
# Convert to standardized format
return self._standardize_result(transcript)
3. Subtitle Generation Enhancement¶
Improve subtitle generation capabilities:
def get_subtitles(
self,
transcript_id: str,
format_type: str = "srt",
chars_per_caption: int = 42,
speaker_labels: bool = False
) -> str:
"""
Get subtitles directly from AssemblyAI.
Args:
transcript_id: ID of completed transcript
format_type: Format type ("srt" or "vtt")
chars_per_caption: Maximum characters per caption
speaker_labels: Include speaker labels in subtitle
Returns:
Formatted subtitle content
"""
# Direct SDK method if available
if hasattr(self.transcript, f"export_subtitles_{format_type.lower()}"):
method = getattr(self.transcript, f"export_subtitles_{format_type.lower()}")
return method(chars_per_caption=chars_per_caption)
# Fallback to API call
params = {
"chars_per_caption": chars_per_caption,
"speaker_labels": speaker_labels
}
endpoint = f"{self.config.base_url}/transcript/{transcript_id}/{format_type.lower()}"
response = requests.get(
endpoint,
params=params,
headers={"authorization": aai.settings.api_key}
)
response.raise_for_status()
return response.text
4. Audio Intelligence Methods¶
Add support for audio intelligence features:
def get_intelligence_features(
self,
transcript: Any
) -> Dict[str, Any]:
"""
Extract intelligence features if enabled in configuration.
Args:
transcript: AssemblyAI transcript object
Returns:
Dictionary with available intelligence features
"""
features = {}
# Add sentiment analysis
if hasattr(transcript, "sentiment_analysis") and transcript.sentiment_analysis:
features["sentiment_analysis"] = transcript.sentiment_analysis
# Add auto chapters
if hasattr(transcript, "chapters") and transcript.chapters:
features["chapters"] = transcript.chapters
# Add entity detection
if hasattr(transcript, "entities") and transcript.entities:
features["entities"] = transcript.entities
# Add auto highlights
if hasattr(transcript, "auto_highlights") and transcript.auto_highlights:
features["highlights"] = transcript.auto_highlights
# Add topic classification
if hasattr(transcript, "iab_categories") and transcript.iab_categories:
features["topics"] = {
"results": transcript.iab_categories.results,
"summary": transcript.iab_categories.summary
}
return features
5. Region Support¶
Add EU region support with configuration detection:
def _is_eu_region(self) -> bool:
"""Detect if service is configured for EU region."""
return "eu" in self.config.base_url.lower()
def _get_appropriate_base_url(self) -> str:
"""Get the appropriate base URL based on configuration."""
if self._is_eu_region():
return "https://api.eu.assemblyai.com/v2"
return "https://api.assemblyai.com/v2"
6. Webhook Support¶
Add webhook support for asynchronous processing:
def _configure_webhook(self, config: Dict[str, Any]) -> Dict[str, Any]:
"""
Configure webhook parameters if enabled.
Args:
config: Current configuration dictionary
Returns:
Updated configuration with webhook parameters
"""
if self.config.webhook_url:
config["webhook_url"] = self.config.webhook_url
if self.config.webhook_auth_header_name and self.config.webhook_auth_header_value:
config["webhook_auth_header_name"] = self.config.webhook_auth_header_name
config["webhook_auth_header_value"] = self.config.webhook_auth_header_value
return config
Consequences¶
Advantages¶
- Complete API Coverage: Access to all AssemblyAI features through our service layer.
- Improved Maintainability: Less code through SDK usage and better organization.
- Type Safety: Enhanced type annotations for better IDE support.
- Configuration Flexibility: All API options available through configuration.
- Better Error Handling: Specific exception types for common failures.
- Regional Compliance: EU region support for data residency requirements.
Disadvantages¶
- Migration Effort: Existing code using the service will need updates.
- SDK Dependency: Introduces dependency on the official SDK.
- Complexity: More configuration options increase initial learning curve.
- Version Tracking: Need to track SDK version compatibility.
Implementation Plan¶
- Core Service Update: Replace HTTP calls with SDK usage.
- Configuration Model: Implement comprehensive configuration model.
- Feature Implementation: Add support for all major features.
- Testing: Add tests for new functionality.
- Documentation: Update documentation with new features and examples.
- Migration Guide: Create guide for transitioning from old implementation.