MUSAN

by JHU CLSP

109h corpus of music + speech + noise — augmentation backbone for ASR/SV.

TL;DR

109h corpus of music + speech + noise — augmentation backbone for ASR/SV.

Best for data augmentation (additive noise, music) for ASR + speaker verification training. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
OpenSLR

What it is

MUSAN (OpenSLR-17) is the canonical augmentation corpus for speech systems — 109 hours of mixed speech, music, and noise, all CC BY 4.0.

Best for: Data augmentation (additive noise, music) for ASR + speaker verification training.
Watch out for: CC BY 4.0 · 60h speech + 42h music + 6h noise · curated to be permissively-licensed source material. Cite: Snyder et al., 2015.

Install / use

https://www.openslr.org/17/

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported0
HIPAA eligibleNo

MUSAN vs Whipscribe

FeatureMUSANWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages099
PlatformsOpenSLRWeb, API, MCP

Alternatives to MUSAN

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.