TED-LIUM 3
by LIUM (Le Mans University)
452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.
TL;DR
452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.
Best for lecture-style + monologue English ASR; testing robustness beyond read audiobooks. Pricing: free.
Category
Open source
License
—
Stars
—
Last push
—
Pricing
free
Platforms
HuggingFace, OpenSLR
What it is
TED-LIUM Release 3 is 452 hours of TED talks aligned with transcripts. Standard alternative benchmark to LibriSpeech for English ASR — monologue prosody, varied accents, live-stage audio. License: CC BY-NC-ND 3.0 (non-commercial).
Best for: Lecture-style + monologue English ASR; testing robustness beyond read audiobooks.
Watch out for: CC BY-NC-ND 3.0 · NON-COMMERCIAL ONLY · single-speaker monologue bias · English only. Cite: Hernandez et al., SPECOM 2018.
Watch out for: CC BY-NC-ND 3.0 · NON-COMMERCIAL ONLY · single-speaker monologue bias · English only. Cite: Hernandez et al., SPECOM 2018.
Install / use
from datasets import load_dataset; ds = load_dataset('LIUM/tedlium', 'release3')
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 1 |
| HIPAA eligible | No |
TED-LIUM 3 vs Whipscribe
| Feature | TED-LIUM 3 | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | Yes | Yes |
| Streaming | No | No |
| Languages | 1 | 99 |
| Platforms | HuggingFace, OpenSLR | Web, API, MCP |
Alternatives to TED-LIUM 3
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.