Looking at YODAS? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

YODAS

by CMU / WAVLab

500kh of YouTube speech across 100+ languages with CC-licensed subtitles.

TL;DR

500kh of YouTube speech across 100+ languages with CC-licensed subtitles.

Best for massive multilingual self-supervised pretraining; ASR pretraining for low-resource languages. Pricing: research-only.

What it is

YODAS (YouTube-Oriented Dataset for Audio and Speech) provides ~500k hours of speech across 140+ languages with CC-licensed subtitles, harvested from YouTube. The largest open multilingual speech corpus to date.

Best for: Massive multilingual self-supervised pretraining; ASR pretraining for low-resource languages.
Watch out for: CC BY 3.0 (subtitles) but underlying YouTube videos are individually-licensed · video access subject to YouTube TOS · per-language quality varies hugely. Cite: Li et al., ASRU 2023.

Install / use

from datasets import load_dataset; ds = load_dataset('espnet/yodas', 'en000')

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	No
Languages supported	140
HIPAA eligible	No

YODAS vs Whipscribe

Feature	YODAS	Whipscribe
Category	Open source	Transcription APIs
Pricing	research-only	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	No	No
Languages	140	99
Platforms	HuggingFace	Web, API, MCP

Alternatives to YODAS

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.