ADR-018: Markdown Output Format¶

Date: 2025-11-07 Status: Accepted Context: Phase 3 Unit 6 - Markdown Output System

Context¶

Extracted content needs to be formatted as readable markdown files that users can: - Read in any markdown viewer - Import into Obsidian/Notion/Roam - Search and reference - Link between notes

We need to decide: 1. Markdown structure and formatting 2. Frontmatter format (YAML/TOML/JSON) 3. Template-specific formatting 4. Obsidian compatibility

Decision¶

We will generate markdown files with YAML frontmatter and template-specific formatting.

Structure:

---
template: template-name
podcast: Podcast Name
episode: Episode Title
date: 2025-11-07
url: https://...
extracted_with: gemini
cost_usd: 0.01
tags:
  - podcast
  - inkwell
  - quotes
---

# Content Heading

Content formatted according to template type...

Rationale¶

Why YAML Frontmatter?¶

Alternatives considered: 1. TOML frontmatter 2. JSON frontmatter 3. No frontmatter (content only) 4. Custom format

Decision: YAML frontmatter

Pros: - ✅ Standard in Obsidian, Jekyll, Hugo - ✅ Human-readable - ✅ Supports lists and nested data - ✅ Most markdown tools support it - ✅ Easy to parse with PyYAML

Cons: - ❌ Whitespace-sensitive (minor)

Verdict: YAML is the de facto standard for markdown frontmatter.

Frontmatter Fields¶

Essential fields: - template: Template name (for filtering/organization) - podcast: Podcast name (for grouping) - episode: Episode title (for identification) - date: Generation date (for sorting)

Optional fields: - url: Episode URL (for reference) - extracted_with: Provider used (cache/gemini/claude) - cost_usd: Extraction cost (for tracking) - tags: Obsidian-style tags (for categorization)

Template-Specific Formatting¶

Different content types need different formatting:

Quotes:

# Quotes

## Quote 1

> The actual quote text

**Speaker:** John Doe
**Timestamp:** 12:34

Why blockquotes? Standard markdown convention for quotes.

Concepts:

# Key Concepts

## Concept Name

Explanation of the concept

**Context:** Where it was discussed

Tools/Books:

# Tools & Technologies

| Tool | Category | Context |
|------|----------|---------|
| Python | language | Backend |

Why tables? Structured data is clearer in table format.

Summary:

# Summary

Summary text in markdown format...

## Key Takeaways

- Point 1
- Point 2

Obsidian Compatibility¶

Tags format: - tags: [podcast, inkwell, quotes] - YAML array format - Works in Obsidian tag pane - Clickable tags in preview

Wikilinks: - Could add: [[Podcast Name]] linking - Decision: NOT implemented (user can add manually) - Rationale: Don't want to assume user's vault structure

Backlinks: - Episode files naturally backlink via filename - Works automatically in Obsidian

Implementation¶

MarkdownGenerator Class¶

class MarkdownGenerator:
    def generate(self, result: ExtractionResult, metadata: dict) -> str:
        """Generate markdown from extraction result."""
        parts = []

        # Frontmatter
        frontmatter = self._generate_frontmatter(result, metadata)
        parts.append(frontmatter)

        # Content
        content = self._format_content(result)
        parts.append(content)

        return "\n\n".join(parts)

Format Dispatch¶

def _format_content(self, result: ExtractionResult) -> str:
    content = result.content

    if content.format == "json":
        return self._format_json_content(result.template_name, content)
    elif content.format == "markdown":
        return content.data["text"]  # Pass-through
    elif content.format == "yaml":
        return self._format_yaml_content(content)
    else:  # text
        return content.data["text"]

Template-Specific Formatters¶

Each known template type has a custom formatter: - _format_quotes() - Blockquotes with metadata - _format_concepts() - Headings with explanations - _format_tools() - Markdown table - _format_books() - List with details - _format_generic_json() - JSON code block (fallback)

Usage¶

Basic Generation¶

generator = MarkdownGenerator()

result = ExtractionResult(
    template_name="summary",
    content=ExtractedContent(...),
    cost_usd=0.01,
    provider="gemini"
)

markdown = generator.generate(result, episode_metadata)

Without Frontmatter¶

markdown = generator.generate(
    result,
    episode_metadata,
    include_frontmatter=False
)

Use case: Concatenating multiple outputs into single file.

Examples¶

Example 1: Quote Extraction¶

---
template: quotes
podcast: The Test Podcast
episode: Episode 42
date: 2025-11-07
url: https://example.com/ep42
extracted_with: claude
cost_usd: 0.12
tags:
  - podcast
  - inkwell
  - quotes
---

# Quotes

## Quote 1

> Focus is the key to productivity

**Speaker:** Cal Newport
**Timestamp:** 15:30

## Quote 2

> Deep work matters in a distracted world

**Speaker:** Cal Newport
**Timestamp:** 22:15

Example 2: Summary¶

---
template: summary
podcast: Tech Talk
episode: AI in 2024
date: 2025-11-07
extracted_with: gemini
cost_usd: 0.003
tags:
  - podcast
  - inkwell
  - summary
---

# Summary

This episode explores the current state of AI technology in 2024,
focusing on large language models, their capabilities, and limitations.

## Key Takeaways

- LLMs are powerful but not AGI
- Focus on practical applications
- Ethics remain important

Example 3: Tools Table¶

---
template: tools-mentioned
podcast: Dev Podcast
episode: Modern Stack
date: 2025-11-07
extracted_with: gemini
cost_usd: 0.002
tags:
  - podcast
  - inkwell
  - tools
---

# Tools & Technologies Mentioned

| Tool | Category | Context |
|------|----------|---------|
| Python | language | Backend development |
| React | framework | Frontend UI |
| Docker | platform | Containerization |

Design Decisions¶

Decision 1: Template Name in Frontmatter¶

Decision: Include template: name field

Rationale: - Users can filter by template type - Useful for bulk operations - Clear provenance

Decision 2: Provider and Cost in Frontmatter¶

Decision: Include extracted_with and cost_usd

Rationale: - Transparency about extraction source - Cost tracking - Debug information (was it cached?)

Trade-off: Exposes implementation details, but users appreciate transparency.

Decision 3: Separate Files per Template¶

Decision: Each template → separate markdown file

Rationale: - ✅ Easier to navigate - ✅ Better for Obsidian (one note per concept) - ✅ Can link between files - ❌ More files to manage

Alternative: Single file with all extractions - Harder to navigate - Obsidian works better with atomic notes

Verdict: Separate files is better UX.

Decision 4: Markdown Pass-Through for Summary¶

Decision: If template outputs markdown, use it as-is

Rationale: - LLM already formatted it well - Don't want to impose structure - Respect LLM's judgment

Decision 5: Generic JSON Fallback¶

Decision: Unknown templates → JSON code block

Rationale: - Safe default - Preserves all data - Users can see raw structure - Better than error

Consequences¶

Positive¶

✅ Human-readable output ✅ Obsidian-compatible out of the box ✅ Searchable and linkable ✅ Template-specific formatting improves UX ✅ Frontmatter enables filtering and organization ✅ Easy to customize (extend formatters)

Negative¶

❌ YAML frontmatter slightly increases file size ❌ Template-specific formatters need maintenance ❌ Not all tools support frontmatter (rare)

Neutral¶

Separate file per template (design choice)
Blockquotes for quotes (standard convention)
Tables for structured data (good for some tools, not all)

Future Enhancements¶

1. Custom Formatters¶

Allow users to define custom formatters:

generator.register_formatter("custom-template", custom_formatter_fn)

2. Wikilink Generation¶

Automatically add Obsidian wikilinks:

Discussed in [[Episode 42]] from [[The Test Podcast]]

Trade-off: Assumes vault structure.

3. Dataview Integration¶

Add Dataview-compatible frontmatter:

dataview:
  speakers: [John, Jane]
  topics: [AI, ML]

4. Export Formats¶

Support other formats: - HTML export - PDF generation - JSON export

5. Template Inheritance¶

Share formatting between similar templates:

class QuotesFormatter(BaseFormatter):
    def format(self, data):
        # Shared quote formatting

Testing Strategy¶

Unit tests: - Frontmatter generation - Template-specific formatters - Edge cases (empty data, missing fields) - Unicode handling - Full generation pipeline

Manual testing: - Open in Obsidian - Verify tag navigation - Check search functionality - Test wikilinks (if added)

Unit 6 Devlog - Implementation details
Unit 2: Output Models - EpisodeMetadata, OutputFile

Revision History¶

2025-11-07: Initial ADR (Phase 3 Unit 6)