MLCommons Speech

by MLCommons

MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.

TL;DR

MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.

Best for industry-standard ASR training benchmarks + permissively-licensed data releases. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Web

What it is

MLCommons (formerly MLPerf) runs the speech working group that published People's Speech (30kh CC-BY) and the MLPerf ASR benchmarks.

Best for: Industry-standard ASR training benchmarks + permissively-licensed data releases.
Watch out for: Apache-2.0 + CC-BY-SA · industry consortium.

Install / use

https://mlcommons.org/datasets/peoples-speech/

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

MLCommons Speech vs Whipscribe

FeatureMLCommons SpeechWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsWebWeb, API, MCP

Alternatives to MLCommons Speech

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.