Multilingual LibriSpeech (MLS)

by Meta AI / Facebook AI Research

44.5k hours of read multilingual audiobook speech across 8 European languages.

TL;DR

44.5k hours of read multilingual audiobook speech across 8 European languages.

Best for multilingual ASR training; cross-language transfer baselines beyond English. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
HuggingFace, OpenSLR

What it is

Multilingual LibriSpeech (MLS) extends LibriSpeech to 8 European languages from LibriVox: 44.5k hours total. English dominates (44.7k); other languages range 100–2000h. License: CC BY 4.0.

Best for: Multilingual ASR training; cross-language transfer baselines beyond English.
Watch out for: CC BY 4.0 · read audiobook (LibriVox) bias · 8 languages only (en, de, nl, fr, es, it, pt, pl) · public-domain books only. Cite: Pratap et al., Interspeech 2020.

Install / use

from datasets import load_dataset; ds = load_dataset('facebook/multilingual_librispeech', 'german')

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported8
HIPAA eligibleNo

Multilingual LibriSpeech (MLS) vs Whipscribe

FeatureMultilingual LibriSpeech (MLS)Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages899
PlatformsHuggingFace, OpenSLRWeb, API, MCP

Alternatives to Multilingual LibriSpeech (MLS)

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.