Looking at YODAS2? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

YODAS2

by CMU / WAVLab

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

TL;DR

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

Best for long-form ASR training (full videos preserved); cross-lingual alignment. Pricing: research-only.

What it is

YODAS2 refactors YODAS with full-video audio + segment-level metadata (versus 30-second clips in v1), enabling long-form ASR training. 149 languages, 422k hours.

Best for: Long-form ASR training (full videos preserved); cross-lingual alignment.
Watch out for: Same YouTube + CC BY 3.0 subtitle constraint as YODAS v1. Cite: Li et al., 2024.

Install / use

from datasets import load_dataset; ds = load_dataset('espnet/yodas2', 'en000')

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	No
Languages supported	149
HIPAA eligible	No

YODAS2 vs Whipscribe

Feature	YODAS2	Whipscribe
Category	Open source	Transcription APIs
Pricing	research-only	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	No	No
Languages	149	99
Platforms	HuggingFace	Web, API, MCP

Alternatives to YODAS2

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.