ML-SUPERB

by Academic consortium (CMU + NTU + JHU + others)

Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.

TL;DR

Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.

Best for probing multilingual self-supervised speech models (XLS-R, MMS, mHuBERT) across LID + ASR. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
GitHub, HuggingFace

What it is

ML-SUPERB is the multilingual extension of SUPERB — 143 languages across language identification and ASR tasks, designed for probing SSL speech encoders. Toolkit: Apache-2.0; component data licenses vary.

Best for: Probing multilingual self-supervised speech models (XLS-R, MMS, mHuBERT) across LID + ASR.
Watch out for: Apache-2.0 toolkit · underlying data licenses vary per corpus (Common Voice, MLS, Babel, etc.) · academic-only Babel subset. Cite: Shi et al., Interspeech 2023.

Install / use

git clone https://github.com/s3prl/s3prl  # ML-SUPERB benchmark suite

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported143
HIPAA eligibleNo

ML-SUPERB vs Whipscribe

FeatureML-SUPERBWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages14399
PlatformsGitHub, HuggingFaceWeb, API, MCP

Alternatives to ML-SUPERB

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.