ADR-016: API Provider Abstraction¶

Date: 2025-11-07 Status: Accepted Context: Phase 3 Unit 4 - LLM Provider Implementation

Context¶

Inkwell uses LLMs to extract structured information from podcast transcripts. Multiple LLM providers are available (Claude, Gemini), each with different: - APIs and SDKs - Pricing models - Capabilities (JSON mode, function calling, etc.) - Quality characteristics

We need a way to: 1. Support multiple providers with consistent interface 2. Make it easy to add new providers 3. Allow users to choose provider based on needs (quality vs. cost) 4. Estimate costs before making API calls

Decision¶

We will create an abstract BaseExtractor class that defines the interface for all LLM providers.

All provider implementations (ClaudeExtractor, GeminiExtractor, etc.) must:

Inherit from BaseExtractor
Implement required abstract methods:
async def extract() - Perform extraction
def estimate_cost() - Calculate cost estimate
def supports_structured_output() - Report JSON mode support
Can override optional methods:
build_prompt() - Prompt rendering (default Jinja2 implementation provided)
_count_tokens() - Token estimation (default 4 chars/token approximation)

Rationale¶

Why Abstract Base Class?¶

Alternatives considered: - Protocol (structural typing) - Duck typing (no interface) - Strategy pattern with separate classes

Decision: ABC

Pros: - ✅ Explicit interface contract - ✅ Runtime validation (can't instantiate without implementing required methods) - ✅ IDE support and type checking - ✅ Clear documentation of requirements

Cons: - ❌ More verbose than Protocol - ❌ Inheritance (composition might be cleaner)

Verdict: Benefits outweigh drawbacks for this use case.

Why These Methods?¶

async def extract(template, transcript, metadata) -> str - Primary functionality: perform extraction - Async for non-blocking API calls - Returns raw string (parsing happens elsewhere) - Takes template, transcript, and metadata

def estimate_cost(template, transcript_length) -> float - Users need to know costs before committing - Returns USD cost estimate - Allows cost-aware template selection

def supports_structured_output() -> bool - Some providers have native JSON mode - Others need prompting for JSON - Enables optimization decisions

Why Not Include Parsing?¶

Parsing happens in extraction engine (Unit 5), not in provider implementation.

Reasons: - Parsing logic is provider-agnostic - Easier to test separately - Provider focuses on API communication - Separation of concerns

Provider Selection Strategy¶

Current approach: Manual selection via config/parameter

Future considerations: - Auto-selection based on template requirements - Fallback chain (try Claude, fall back to Gemini) - A/B testing for quality comparison

Implementation¶

BaseExtractor (Abstract)¶

class BaseExtractor(ABC):
    @abstractmethod
    async def extract(
        self,
        template: ExtractionTemplate,
        transcript: str,
        metadata: dict[str, Any]
    ) -> str:
        """Extract content using template."""
        pass

    @abstractmethod
    def estimate_cost(
        self,
        template: ExtractionTemplate,
        transcript_length: int
    ) -> float:
        """Estimate cost in USD."""
        pass

    @abstractmethod
    def supports_structured_output(self) -> bool:
        """Whether provider has native JSON mode."""
        pass

    def build_prompt(self, template, transcript, metadata) -> str:
        """Build prompt from template (default Jinja2)."""
        # Default implementation provided
        pass

ClaudeExtractor¶

Pricing (Nov 2024): - Input: $3.00/M tokens - Output: $15.00/M tokens

Features: - High quality (best for quotes, precise data) - Native JSON mode (response_format) - Expensive but accurate

Use cases: - Quote extraction (precision critical) - Complex structured data - When budget allows

GeminiExtractor¶

Pricing (Nov 2024): - Input: $0.075/M tokens (<128K) - Input: $0.15/M tokens (>128K) - Output: $0.30/M tokens

Features: - Good quality (85-90% of Claude quality) - Native JSON mode (response_mime_type) - 40x cheaper than Claude - Long context (1M tokens)

Use cases: - Summaries - General extraction - Cost-sensitive scenarios - Long transcripts

Cost Comparison¶

Example: 10K word transcript (40K tokens), 1K token output

Provider	Input Cost	Output Cost	Total	Relative
Claude	$0.12	$0.015	$0.135	41x
Gemini	$0.003	$0.0003	$0.003	1x

Gemini is ~41x cheaper for typical workloads.

Error Handling¶

Error Hierarchy¶

ExtractionError (base)
├── ProviderError (API errors, rate limits, auth)
├── ValidationError (invalid output, schema mismatch)
└── TemplateError (template rendering issues)

Retry Strategy¶

Not implemented in Unit 4. Future work: - Exponential backoff for rate limits - Automatic fallback to alternate provider - Circuit breaker for persistent failures

Testing Strategy¶

Unit tests: - Mock API responses - Test error handling - Verify cost estimation - Validate prompt building

Integration tests (Unit 9): - Real API calls with test keys - E2E extraction flow - Cost tracking

Future Enhancements¶

1. Additional Providers¶

Easy to add: - OpenAI GPT-4 - Cohere - Open-source models (Llama via Ollama)

2. Provider Auto-Selection¶

def select_provider(template: ExtractionTemplate) -> BaseExtractor:
    if template.model_preference == "claude":
        return ClaudeExtractor()
    elif template.expected_format == "json" and template.max_tokens > 2000:
        return ClaudeExtractor()  # Complex structured data
    else:
        return GeminiExtractor()  # Default to cheaper

3. Caching & Rate Limiting¶

class RateLimitedExtractor(BaseExtractor):
    def __init__(self, inner: BaseExtractor, max_rpm: int):
        self.inner = inner
        self.rate_limiter = RateLimiter(max_rpm)

    async def extract(self, ...):
        async with self.rate_limiter:
            return await self.inner.extract(...)

4. Cost Tracking¶

class CostTrackingExtractor(BaseExtractor):
    def __init__(self, inner: BaseExtractor):
        self.inner = inner
        self.total_cost = 0.0

    async def extract(self, ...):
        estimated_cost = self.inner.estimate_cost(...)
        result = await self.inner.extract(...)
        self.total_cost += estimated_cost
        return result

Consequences¶

Positive¶

✅ Easy to add new providers (implement 3 methods) ✅ Consistent interface across providers ✅ Cost transparency for users ✅ Provider-agnostic template system ✅ Testable with mocks

Negative¶

❌ Abstraction overhead (extra layer) ❌ Lowest common denominator (can't use provider-specific features easily) ❌ Cost estimation is approximate (actual costs may vary)

Neutral¶

Async all the way (required for good performance)
Provider selection is manual (could be smarter)

ADR-013: LLM Provider Abstraction - Initial decision
Research: LLM Extraction Comparison - Quality vs. cost analysis
Unit 4 Devlog - Implementation details

Revision History¶

2025-11-07: Initial ADR (Phase 3 Unit 4)