This American Life Podcast Transcripts

by Mao et al. (academic)

Long-form podcast ASR + speaker-role corpus.

TL;DR

Long-form podcast ASR + speaker-role corpus.

Best for long-form podcast ASR + speaker-role classification (host/guest/caller). Pricing: research-only.

Category
Open source
License
Stars
Last push
Pricing
research-only
Platforms
GitHub

What it is

TAL Corpus aligns This American Life podcast episodes with their official transcripts. Researchers fetch audio + alignments locally; transcripts are not redistributed. Research-only.

Best for: Long-form podcast ASR + speaker-role classification (host/guest/caller).
Watch out for: Transcripts copyrighted (This American Life) · pipeline only for research use · audio downloads via podcast feed. Cite: Mao et al., ASRU 2019.

Install / use

git clone https://github.com/zaidalyafeai/tal-corpus  # alignment scripts

Features

Speaker diarizationYes
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

This American Life Podcast Transcripts vs Whipscribe

FeatureThis American Life Podcast TranscriptsWhipscribe
CategoryOpen sourceTranscription APIs
Pricingresearch-onlyfree beta
Speaker diarizationYesYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsGitHubWeb, API, MCP

Alternatives to This American Life Podcast Transcripts

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.