YODAS2
by CMU / WAVLab
Refresh of YODAS with long-form audio + per-language sharding — 422k hours.
TL;DR
Refresh of YODAS with long-form audio + per-language sharding — 422k hours.
Best for long-form ASR training (full videos preserved); cross-lingual alignment. Pricing: research-only.
Category
Open source
License
—
Stars
—
Last push
—
Pricing
research-only
Platforms
HuggingFace
What it is
YODAS2 refactors YODAS with full-video audio + segment-level metadata (versus 30-second clips in v1), enabling long-form ASR training. 149 languages, 422k hours.
Best for: Long-form ASR training (full videos preserved); cross-lingual alignment.
Watch out for: Same YouTube + CC BY 3.0 subtitle constraint as YODAS v1. Cite: Li et al., 2024.
Watch out for: Same YouTube + CC BY 3.0 subtitle constraint as YODAS v1. Cite: Li et al., 2024.
Install / use
from datasets import load_dataset; ds = load_dataset('espnet/yodas2', 'en000')
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 149 |
| HIPAA eligible | No |
YODAS2 vs Whipscribe
| Feature | YODAS2 | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | research-only | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | Yes | Yes |
| Streaming | No | No |
| Languages | 149 | 99 |
| Platforms | HuggingFace | Web, API, MCP |
Alternatives to YODAS2
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.