CoVoST 2

by Meta AI

Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.

TL;DR

Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.

Best for many-to-one and one-to-many speech translation benchmarks at scale (2880h total). Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
HuggingFace, GitHub

What it is

CoVoST 2 extends Common Voice with translation targets — 21 source languages → English (2880h) plus English → 15 targets. The standard low-resource speech-translation benchmark. License: CC0.

Best for: Many-to-one and one-to-many speech translation benchmarks at scale (2880h total).
Watch out for: CC0 1.0 (audio from Common Voice) · translation quality varies per pair · low-resource target pairs are <100h. Cite: Wang et al., LREC 2020.

Install / use

from datasets import load_dataset; ds = load_dataset('facebook/covost2', 'en_de')

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported22
HIPAA eligibleNo

CoVoST 2 vs Whipscribe

FeatureCoVoST 2Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages2299
PlatformsHuggingFace, GitHubWeb, API, MCP

Alternatives to CoVoST 2

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.