YODAS2

by CMU / WAVLab

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

TL;DR

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

Best for long-form ASR training (full videos preserved); cross-lingual alignment. Pricing: research-only.

Category
Open source
License
Stars
Last push
Pricing
research-only
Platforms
HuggingFace

What it is

YODAS2 refactors YODAS with full-video audio + segment-level metadata (versus 30-second clips in v1), enabling long-form ASR training. 149 languages, 422k hours.

Best for: Long-form ASR training (full videos preserved); cross-lingual alignment.
Watch out for: Same YouTube + CC BY 3.0 subtitle constraint as YODAS v1. Cite: Li et al., 2024.

Install / use

from datasets import load_dataset; ds = load_dataset('espnet/yodas2', 'en000')

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported149
HIPAA eligibleNo

YODAS2 vs Whipscribe

FeatureYODAS2Whipscribe
CategoryOpen sourceTranscription APIs
Pricingresearch-onlyfree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages14999
PlatformsHuggingFaceWeb, API, MCP

Alternatives to YODAS2

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.