OpenSLR (catalog)

by Daniel Povey (JHU CLSP)

Open Speech and Language Resources — the index of 130+ free speech corpora.

TL;DR

Open Speech and Language Resources — the index of 130+ free speech corpora.

Best for discovering open speech corpora — every major academic dataset has a SLR-N ID. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Web

What it is

OpenSLR is the canonical index of 130+ free open speech + language datasets, hosted by Daniel Povey. LibriSpeech, MUSAN, RIRS-Noises, AISHELL, TED-LIUM, MLS, and dozens of low-resource corpora live here under SLR-N IDs.

Best for: Discovering open speech corpora — every major academic dataset has a SLR-N ID.
Watch out for: Per-resource licenses (CC BY, Apache-2.0, custom) — check each SLR-N page. Hosted as a free convenience catalog by Daniel Povey.

Install / use

https://www.openslr.org/  # browse SLR-1 through SLR-130+

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported70
HIPAA eligibleNo

OpenSLR (catalog) vs Whipscribe

FeatureOpenSLR (catalog)Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages7099
PlatformsWebWeb, API, MCP

Alternatives to OpenSLR (catalog)

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.