TED-LIUM 3

by LIUM (Le Mans University)

452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.

TL;DR

452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.

Best for lecture-style + monologue English ASR; testing robustness beyond read audiobooks. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
HuggingFace, OpenSLR

What it is

TED-LIUM Release 3 is 452 hours of TED talks aligned with transcripts. Standard alternative benchmark to LibriSpeech for English ASR — monologue prosody, varied accents, live-stage audio. License: CC BY-NC-ND 3.0 (non-commercial).

Best for: Lecture-style + monologue English ASR; testing robustness beyond read audiobooks.
Watch out for: CC BY-NC-ND 3.0 · NON-COMMERCIAL ONLY · single-speaker monologue bias · English only. Cite: Hernandez et al., SPECOM 2018.

Install / use

from datasets import load_dataset; ds = load_dataset('LIUM/tedlium', 'release3')

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

TED-LIUM 3 vs Whipscribe

FeatureTED-LIUM 3Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages199
PlatformsHuggingFace, OpenSLRWeb, API, MCP

Alternatives to TED-LIUM 3

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.