Open source transcription tools

Self-hostable transcription engines and desktop apps you can run yourself, with source you can read and modify.

357 tools · updated 2026-05-15
OpenAI Whisper
OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k
whisper.cpp
Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k
faster-whisper
SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k
whisperX
Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

OSS · BSD‑2‑Clause ★ 21.4k
insanely-fast-whisper
Vaibhav Srivastav

CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.

OSS · Apache‑2.0 ★ 12.4k
stable-ts
jianfch

Whisper with stabilised timestamps — more accurate word-level timing.

OSS · MIT ★ 2.2k
WhisperKit
Argmax

Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.

OSS · MIT ★ 6.0k
distil-whisper
Hugging Face

Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.

OSS · MIT ★ 4.1k
SeamlessM4T
Meta AI

Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.

OSS · NOASSERTION ★ 11.8k
Vosk
Alpha Cephei

Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.

OSS · Apache‑2.0 ★ 14.6k
Buzz
Chidi Williams

Cross-platform desktop app for Whisper — open-source MacWhisper alternative.

OSS · MIT ★ 18.8k
Tortoise TTS
neonbjb

Open-source TTS model with strong prosody — slow on CPU.

OSS · free
Coqui TTS
Coqui

Open-source TTS toolkit with multi-language voice models.

OSS · free
Whisper JAX
Sanchit Gandhi (HuggingFace)

70x faster Whisper on TPUs via JAX + Flax + batching.

OSS · free
MLX Whisper
Apple ML Research

Whisper inference on Apple Silicon via Apple's MLX framework.

OSS · free
whisper-rs
tazz4843

Idiomatic Rust bindings for whisper.cpp.

OSS · free
Const-me Whisper
Const-me

Whisper running on Windows via DirectCompute / GPGPU.

OSS · free
Whisper Standalone (Purfview)
Purfview

Single-EXE Whisper for Windows + Linux, no dependencies.

OSS · free
whisper-ctranslate2
Softcatalà

Command-line Whisper using CTranslate2 — closest match to openai/whisper CLI.

OSS · free
pywhispercpp
abdeladim-s

Python bindings for whisper.cpp with a simple iterator API.

OSS · free
Whisper-WebUI
jhj0517

Gradio web UI bundling faster-whisper + diarization + translation.

OSS · free
WhisperLive
Collabora

Real-time Whisper transcription over WebSockets.

OSS · free
WhisperFusion
Collabora

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

OSS · free
WhisperBot
Collabora

WhisperFusion's voice-chat reference app.

OSS · free
whisper_streaming
ÚFAL (Charles University)

Academic real-time Whisper streaming with LocalAgreement-2.

OSS · free
whisper-timestamped
LINTO / Linagora

Word-level timestamps for OpenAI Whisper without retraining.

OSS · free
whisper-diarization
Mahmoud Ashraf

Whisper + NeMo MSDD diarization pipeline.

OSS · free
faster-whisper-server
fedirz

OpenAI-compatible /v1/audio/transcriptions endpoint over faster-whisper.

OSS · free
whisper-asr-webservice
Ahmet Öner

Dockerized Whisper REST API with multiple backends.

OSS · free
WhisperS2T
shashikg

Optimized batched Whisper engine with VAD + dynamic batching.

OSS · free
whisper-playground
Sahar Mor

Mic-in-browser → real-time Whisper transcription demo.

OSS · free
LiveWhisper
Nikorasu

Always-listening hot-mic Whisper transcriber.

OSS · free
generate-subtitles
mayeaux

Single-page web UI to generate subtitles via Whisper.

OSS · free
WhisperSpeech
WhisperSpeech / Collabora

Whisper inverted into a TTS — also used as ASR-aware training data tool.

OSS · free
Echogarden
Echogarden Project

Easy-to-use speech toolkit: TTS, STT, alignment, language detection.

OSS · free
Voice-Pro
abus-aikorea

One-click Whisper + diarization + voice cloning Gradio app.

OSS · free
CrisperWhisper
Nyra Health

Whisper retrained for medical / clinical transcription accuracy.

OSS · free
NVIDIA NeMo
NVIDIA

Toolkit + model zoo behind Canary, Parakeet, Conformer, FastConformer.

OSS · free
Seamless (SeamlessM4T family)
Meta AI

Meta's multilingual speech-translation + transcription foundation suite.

OSS · free
Fairseq
Meta AI

Meta's seq-to-seq toolkit — home of wav2vec, HuBERT, XLS-R, MMS.

OSS · free
HuggingFace Transformers (Audio)
Hugging Face

One API for Whisper, Wav2Vec2, HuBERT, XLS-R, SeamlessM4T, Parakeet.

OSS · free
HuggingFace Optimum
Hugging Face

ONNX + TensorRT + OpenVINO acceleration for Transformers ASR models.

OSS · free
HuggingFace Accelerate
Hugging Face

Multi-GPU / mixed-precision launcher for any PyTorch ASR training script.

OSS · free
HuggingFace Datasets
Hugging Face

Streaming loader for Common Voice, LibriSpeech, GigaSpeech, FLEURS.

OSS · free
HuggingFace PEFT
Hugging Face

LoRA / adapters for parameter-efficient Whisper fine-tuning.

OSS · free
SpeechBrain
SpeechBrain consortium

PyTorch toolkit for ASR, speaker, diarization, enhancement.

OSS · free
ESPnet
ESPnet community

End-to-end speech toolkit: ASR, TTS, ST, speaker, separation.

OSS · free
Kaldi
Kaldi community

The classic C++ HMM/DNN speech recognition toolkit.

OSS · free
k2
k2-fsa

FSA/FST framework written from scratch in PyTorch/CUDA.

OSS · free
icefall
k2-fsa

ASR recipes (Conformer / Zipformer / Pruned Transducer) for k2 + sherpa.

OSS · free
Sherpa
k2-fsa

Production server for k2/icefall + Whisper models (PyTorch).

OSS · free
sherpa-onnx
k2-fsa

ONNX-runtime ASR: Whisper, Zipformer, Paraformer on every platform.

OSS · free
sherpa-ncnn
k2-fsa

ASR on NCNN — Android-friendly, CPU-only, no FP support needed.

OSS · free
Coqui STT
Coqui

Successor to Mozilla DeepSpeech, maintained by Coqui.

OSS · free
Mozilla DeepSpeech
Mozilla

The original open RNN-T from Mozilla — archived but historic.

OSS · free
WeNet
wenet-e2e

Production-first E2E ASR — U2++ Conformer, streaming + offline.

OSS · free
Athena STT
ATHENA team

End-to-end speech recognition toolkit by ATHENA-OPEN-SOURCE.

OSS · free
PocketSphinx
CMU Sphinx

Lightweight CMU Sphinx engine for embedded keyword spotting.

OSS · free
CMU Sphinx-4
CMU Sphinx

The classic Java speech engine from CMU.

OSS · free
PaddleSpeech
Baidu

Baidu's all-in-one speech toolkit on PaddlePaddle.

OSS · free
PaddlePaddle DeepSpeech
Baidu

The DeepSpeech-style recipes inside PaddleSpeech.

OSS · free
Julius
Julius project

Lightweight Japanese-focused open ASR with WFST decoding.

OSS · free
Montreal Forced Aligner
MontrealCorpusTools

Word-level alignment via Kaldi for 100+ languages.

OSS · free
OpenSeq2Seq
NVIDIA

NVIDIA's TF1 framework — historical home of Jasper + QuartzNet.

OSS · free
RETURNN
RWTH Aachen

RWTH's flexible neural network training framework for ASR research.

OSS · free
FunASR
Alibaba DAMO Academy

Alibaba DAMO's Paraformer / SenseVoice / Whisper toolkit.

OSS · free
Reverb (Rev.com OSS)
Rev.com

Rev's open WFST-decoded ASR + diarization stack.

OSS · free
fstalign
Rev.com

Rev.com's WER + alignment scoring tool over WFSTs.

OSS · free
PaddlePaddle Parakeet (TTS)
Baidu

Baidu's TTS half — included for end-to-end voice pipelines.

OSS · free
pyannote.audio
pyannote / Hervé Bredin

The reference open diarization + speaker embedding toolkit.

OSS · free
pyannote-audio (legacy)
hbredin

Hervé Bredin's personal mirror of pyannote.audio.

OSS · free
Silero VAD
snakers4 / Silero

Tiny, accurate voice-activity-detection model — runs on CPU.

OSS · free
py-webrtcvad
wiseman

Python bindings for Google's WebRTC VAD.

OSS · free
diart
Juan Manuel Coria

Streaming speaker diarization on top of pyannote.

OSS · free
simple_diarizer
cvqluu

A minimal pyannote / SpeechBrain diarization wrapper.

OSS · free
Resemblyzer
Resemble AI

Speaker-verification embeddings from a small generalist encoder.

OSS · free
Moonshine
Useful Sensors

Tiny English ASR optimized for resource-constrained devices.

OSS · free
Moonshine (project mirror)
Moonshine AI

Mirror of Useful Sensors' Moonshine releases.

OSS · free
Transformers.js
Hugging Face

Run Whisper / wav2vec2 entirely in the browser via ONNX Runtime Web.

OSS · free
Transformers.js (xenova mirror)
Joshua Lochner

Original transformers.js repo by Joshua Lochner (pre-merge into HF).

OSS · free
Apple MLX
Apple

Apple's array framework — runs Whisper, Phi, Llama on Apple Silicon.

OSS · free
MLX Swift
Apple

Swift bindings for MLX — embed Whisper in iOS/macOS apps.

OSS · free
MLX Swift Examples
Apple

Reference Swift apps for MLX, including Whisper.

OSS · free
MLX Examples
Apple

Python MLX examples — Whisper, Llama, Stable Diffusion.

OSS · free
MLX Data
Apple

Audio + image data loaders for MLX training.

OSS · free
HuggingFace Candle
Hugging Face

Minimalist Rust ML framework with Whisper support.

OSS · free
ggml
ggml.ai

The tensor library underneath llama.cpp + whisper.cpp.

OSS · free
whisper.cpp (ggml-org)
ggml.ai

The new ggml-org home of whisper.cpp.

OSS · free
llama.cpp
ggml.ai

GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).

OSS · free
ONNX Runtime
Microsoft

Microsoft's cross-platform inference runtime for ONNX-exported Whisper.

OSS · free
ONNX
Linux Foundation

Open exchange format used by every ASR optimizer.

OSS · free
OpenVINO
Intel

Intel's CPU/iGPU/NPU inference toolkit — Whisper-tuned.

OSS · free
OpenVINO Notebooks
Intel

Reference notebooks including Whisper + SeamlessM4T export.

OSS · free
Intel Extension for PyTorch (IPEX)
Intel

BF16/AMX speedups for Whisper PyTorch inference on Intel CPUs.

OSS · free
vLLM
vLLM Project

High-throughput inference engine — supports Whisper / Llava / Qwen-Audio.

OSS · free
SGLang
SGLang Project

Structured generation runtime — supports Qwen-Audio / Phi-Multimodal.

OSS · free
NVIDIA TensorRT-LLM
NVIDIA

NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.

OSS · free
NVIDIA FasterTransformer
NVIDIA

Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.

OSS · free
HF Text Generation Inference (TGI)
Hugging Face

Production inference server — runs audio-multimodal LLMs.

OSS · free
TensorFlowASR
TensorSpeech

TensorFlow 2 end-to-end ASR — Conformer, ContextNet, DeepSpeech2.

OSS · free
MASR (Mandarin Streaming ASR)
yeyupiaoling

Streaming Conformer + DeepSpeech2 in PyTorch for Mandarin.

OSS · free
AudioClassification-Pytorch
yeyupiaoling

Companion audio-classification training repo for MASR.

OSS · free
speech_recognition (Uberi)
Uberi

Multi-backend Python speech-recognition library.

OSS · free
WeSpeaker
wenet-e2e

Production-style speaker embedding + verification toolkit.

OSS · free
WeSep
wenet-e2e

Open speech-separation toolkit aligned with WeNet ASR.

OSS · free
Flashlight
Meta AI

Meta's C++ ML library — homed wav2letter.

OSS · free
Flashlight Sequence
Meta AI

Standalone CTC / sequence decoders from Flashlight.

OSS · free
wav2letter++
Meta AI

Meta's original fast convolutional ASR system.

OSS · free
conformer (sooftware)
sooftware

Reference PyTorch implementation of the Conformer architecture.

OSS · free
Speech-Transformer (sooftware)
sooftware

Reference Speech-Transformer in PyTorch.

OSS · free
Microsoft UniLM
Microsoft

Home of WavLM, HuBERT++, Speech-T5, BEATs, VALL-E.

OSS · free
Microsoft SpeechT5
Microsoft

Unified speech-text Transformer (ASR + TTS + VC).

OSS · free
Microsoft Recognizers-Text
Microsoft

Post-processing for ASR: numbers, dates, units in 20+ languages.

OSS · free
NVIDIA Riva Python Clients
NVIDIA

Open clients for Riva — NVIDIA's commercial ASR/TTS server.

OSS · free
openai-python
OpenAI

Reference SDK — covers the Whisper + Realtime audio endpoints.

OSS · free
LinTO Platform Stack
LINAGORA

Open conversational-AI stack with self-hosted ASR + NLP.

OSS · free
LinTO Transcription Service
LINAGORA

Production transcription microservice powering the LinTO stack.

OSS · free
fairseq2 (via seamless_communication)
Meta AI

Modular successor to fairseq used by Seamless models.

OSS · free
HuggingFace LightEval
Hugging Face

Eval harness — includes WER evaluations for ASR.

OSS · free
HuggingFace Audio Course
Hugging Face

Free open course on audio ML, including Whisper fine-tuning.

OSS · free
SpeechColab Leaderboard
SpeechColab

Open ASR leaderboard (LibriSpeech, GigaSpeech, AISHELL).

OSS · free
StreamSpeech
ICT-NLP

Simultaneous speech-to-speech translation with streaming ASR.

OSS · free
Parler-TTS
Hugging Face

Open TTS — relevant when pairing ASR with read-back TTS.

OSS · free
Quivr
QuivrHQ

OSS 'second brain' that ingests transcripts via Whisper.

OSS · free
UniAudio
yangdongchao / CUHK

Unified audio foundation model (Codec + LM) — handles ASR.

OSS · free
DeepSpeed
Microsoft

Distributed Whisper / Conformer training at scale.

OSS · free
DeepSpeed (deepspeedai mirror)
DeepSpeed AI

The deepspeedai-org home of DeepSpeed.

OSS · free
DeepSpeed-MII
Microsoft

Microsoft's inference-side companion to DeepSpeed.

OSS · free
Microsoft Olive
Microsoft

Model-optimization toolchain — Whisper ONNX/QNN/DirectML targets.

OSS · free
JAX (Google)
Google

Underlying framework for whisper-jax and TPU ASR research.

OSS · free
JAX (JAX-ML org)
JAX-ML

JAX's new home under the JAX-ML org.

OSS · free
Flax
Google

JAX neural-net library used by whisper-jax.

OSS · free
SentencePiece
Google

Subword tokenizer used by Whisper, SeamlessM4T, Canary.

OSS · free
TensorFlow
Google

The framework underlying TensorFlowASR + many older recipes.

OSS · free
TensorFlow Text
Google

Text ops + tokenizers integrated with TF ASR pipelines.

OSS · free
TensorFlow Lingvo
Google

Google's research-grade TF framework — original Conformer code.

OSS · free
NVIDIA Megatron-LM
NVIDIA

Tensor-parallel training — used for Speech-LLM scaling.

OSS · free
NVIDIA Apex
NVIDIA

Mixed-precision / fused ops library used in NeMo training.

OSS · free
cuDNN Frontend
NVIDIA

C++ / Python API for cuDNN — speeds up custom ASR kernels.

OSS · free
CUTLASS
NVIDIA

High-performance CUDA matrix kernels used by Whisper engines.

OSS · free
ColossalAI
HPC-AI Tech

Open distributed framework — supports Whisper LoRA fine-tunes.

OSS · free
MLC-LLM
MLC AI

Compile + deploy LLMs (and Whisper) to phones / browsers / WebGPU.

OSS · free
MLflow
LF AI / Databricks

Track / serve Whisper experiments and model registry.

OSS · free
lm-evaluation-harness
EleutherAI

Eval harness now covering audio-LLM benchmarks.

OSS · free
Piper
Rhasspy

Fast neural TTS for Home Assistant — pairs with Whisper.

OSS · free
Mozilla TTS
Mozilla

Mozilla's archived TTS — historical reference.

OSS · free
NVIDIA Tacotron2
NVIDIA

Reference Tacotron2 + WaveGlow stack from NVIDIA.

OSS · free
NVIDIA WaveGlow
NVIDIA

Flow-based vocoder companion to Tacotron2.

OSS · free
NVIDIA Mellotron
NVIDIA

Multispeaker prosody TTS — historical NVIDIA release.

OSS · free
IMS Toucan
Universität Stuttgart IMS

Multilingual TTS toolkit from Stuttgart IMS.

OSS · free
IMS Toucan (lowercase mirror)
Universität Stuttgart IMS

Alternate-case mirror of IMS Toucan.

OSS · free
VITS
jaywalnut310

Reference E2E TTS — building block for voice-agent loops.

OSS · free
Glow-TTS
jaywalnut310

Flow-based parallel TTS reference.

OSS · free
Suno Bark
Suno AI

Transformer-based generative audio / TTS.

OSS · free
ChatTTS
2noise

Conversational TTS — voice agent companion to Whisper.

OSS · free
Fish Speech
fishaudio

Open zero-shot voice cloning + TTS.

OSS · free
Bert-VITS2
fishaudio

VITS2 + BERT prosody TTS — companion to Whisper.

OSS · free
StyleTTS2
yl4579

Style-conditioned TTS — pairs with Whisper for narration apps.

OSS · free
StyleTTS
yl4579

Original StyleTTS — predecessor of StyleTTS2.

OSS · free
F5-TTS
SWivid

Flow-matching TTS — open and fast.

OSS · free
MetaVoice 1B
MetaVoice

Open zero-shot voice cloning TTS.

OSS · free
GPT-SoVITS
RVC Boss

Few-shot voice cloning — companion to Whisper-cloned datasets.

OSS · free
RVC WebUI
RVC Project

Real-Time Voice Cloning interface — pairs with Whisper alignment.

OSS · free
AudioCraft
Meta AI

Meta's audio-generation stack (MusicGen, AudioGen, EnCodec).

OSS · free
EnCodec
Meta AI

Neural audio codec — used by SeamlessM4T + many speech-LMs.

OSS · free
EnCodec (capitalized mirror)
Meta AI

Mirror of facebookresearch/encodec.

OSS · free
textlesslib
Meta AI

Speech-without-text framework from Meta.

OSS · free
AudioMAE
Meta AI

Masked-Autoencoder pretrain for audio — feeds downstream ASR.

OSS · free
Descript Audio Codec
Descript

High-quality neural audio codec — alternative to EnCodec.

OSS · free
audiotools
Descript

Audio data tooling library that pairs with DAC.

OSS · free
HuggingFace LeRobot
Hugging Face

Open robotics — includes spoken-command ASR demos.

OSS · free
HuggingFace Diffusers
Hugging Face

Generative-audio diffusion — paired with Whisper for content pipelines.

OSS · free
SetFit
Hugging Face

Few-shot text classifier — useful for post-transcript tagging.

OSS · free
paper-qa
Future-House

RAG over PDFs / transcripts — downstream ASR consumer pattern.

OSS · free
GPT-NeoX
EleutherAI

Training framework for large speech-LMs.

OSS · free
OpenCLIP
mlfoundations

Open CLIP — companion vision encoder in multimodal ASR research.

OSS · free
Salesforce CodeT5
Salesforce

Code-generation T5 — used in voice-coding agents on top of Whisper.

OSS · free
Salesforce CTRL
Salesforce

Conditional-LM — historical companion to speech-text research.

OSS · free
NeMo Guardrails
NVIDIA

Safety layer often paired with Whisper voice agents.

OSS · free
Transformers4Rec
NVIDIA-Merlin

Sequence models — companion to spoken-search recommender pipelines.

OSS · free
Pai Megatron Patch
Alibaba

Alibaba's patched Megatron — used for Paraformer scale-up.

OSS · free
Google Snappy
Google

Compression library used by ASR data pipelines.

OSS · free
google-research monorepo
Google Research

Catch-all for Google ASR papers (USM, BigSSL, Conformer).

OSS · free
Google seq2seq
Google

Historical TF1 seq2seq — early Listen-Attend-Spell era.

OSS · free
llm.c
Andrej Karpathy

Andrej Karpathy's bare-metal C training code — reference for compact ASR.

OSS · free
GFPGAN
TencentARC

Face restoration — often paired with Whisper subtitle pipelines.

OSS · free
AnimateDiff
guoyww

Stable-Diffusion animation — used with Whisper subs in content pipelines.

OSS · free
Llama2-Code-Interpreter
SeungyounShin

Voice-coding agent example over Whisper.

OSS · free
torch-harmonics
NVIDIA

Spherical signal transforms — used in advanced ASR research.

OSS · free
whisper-ctranslate2 (SoftcatalA mirror)
Softcatalà

Capitalized-name mirror of whisper-ctranslate2.

OSS · free
faster-whisper (guillaumekln legacy)
Guillaume Klein

Pre-SYSTRAN home of faster-whisper.

OSS · free
llama.cpp (ggml-org)
ggml.ai

The ggml-org-hosted mirror of llama.cpp.

OSS · free
WeNet (capitalized mirror)
wenet-e2e

Capitalized-name mirror of wenet.

OSS · free
oTranscribe
Elliot Bentley

Free browser-based manual transcription tool — keyboard-shortcut transcript editor.

OSS · free
Talon Dictation Models
Talon Voice community

Open dictation engines used by the Talon Voice community.

OSS · free
Vibe
Thomas Beling

Open-source desktop transcription and dictation app built on Whisper.

OSS · free free, open source
AI4Bharat IndicConformer
AI4Bharat / IIT Madras

Open-source Indic ASR models from IIT Madras' AI4Bharat lab — 22 scheduled Indian languages.

OSS · free
Mozilla Common Voice
Mozilla Foundation

Mozilla Common Voice — public-domain multilingual speech corpus that powers many regional STT models.

OSS · free
Meta MMS
Meta AI

Meta Massively Multilingual Speech — open-source ASR for 1,100+ languages.

OSS · free
ASR-IL
Israeli AI consortium

Israeli national Hebrew ASR — research models from the Israeli AI consortium.

OSS · free
AI4D African Language Dataset
AI4D Africa

AI4D Africa — multilingual African speech datasets and ASR baselines.

OSS · free
Khipu Andean ASR
Khipu / Americas NLP

Khipu community — open-source Andean Spanish, Quechua, and Aymara speech research.

OSS · free
VinAI / VinBigData ASR
VinAI Research

VinAI Research — Vietnamese-language ASR and speech research from the Vingroup AI arm.

OSS · free
Khmer ASR (EKS Labs)
Cambodian research community

Khmer-language speech recognition research for the Cambodian market.

OSS · free
Typhoon ASR (Thai)
SCB 10X

Typhoon — Thai-language LLM and ASR initiative from SCB 10X.

OSS · free
Mesolitica Malay ASR
Mesolitica

Mesolitica — Bahasa Malaysia and Bahasa Indonesia speech research checkpoints.

OSS · free
Georgian ASR (TSU)
Tbilisi State University

Tbilisi State University Georgian speech recognition research.

OSS · free
Armenian ASR (Yerevann)
Yerevann

Yerevann research lab Armenian speech recognition checkpoints.

OSS · free
Turkish ASR (Boğaziçi / METU)
Turkish academic community

Open-source Turkish-language ASR checkpoints from Turkish university labs.

OSS · free
Kencorpus Swahili ASR
Kencorpus consortium

Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.

OSS · free
IIIT-Hyderabad Indic Speech
IIIT Hyderabad

IIIT-Hyderabad speech lab — academic Indian-language ASR datasets and checkpoints.

OSS · free
IIT Madras Speech Lab
IIT Madras

IIT Madras speech group — academic Indian-language ASR research and AI4Bharat home.

OSS · free
IIT Bombay Speech
IIT Bombay

IIT Bombay speech group — Indian-language ASR research and Bhashini contributions.

OSS · free
Akylai Kyrgyz ASR
Akylai community

Akylai project — Kyrgyz-language voice assistant and ASR research.

OSS · free
ISSAI Kazakh ASR
ISSAI / Nazarbayev University

Institute of Smart Systems and AI (Nazarbayev University) — Kazakh-language ASR research.

OSS · free
Telugu Speech Corpus
Indian academic community

Open Telugu-language speech corpora and models for SE-Indian transcription.

OSS · free
Tamil Open ASR
Tamil open-source community

Community-published Tamil-language ASR models and corpora.

OSS · free
BNLP Bangla ASR
Bengali NLP community

Bengali-language ASR datasets and models from the BNLP / Bengali NLP community.

OSS · free
L3Cube Marathi ASR
L3Cube / Pune

L3Cube Pune — Marathi-language NLP and speech research releases.

OSS · free
KB-Whisper Swedish
Kungliga Biblioteket

Kungliga Biblioteket (National Library of Sweden) Whisper fine-tunes for Swedish.

OSS · free
NB-Whisper Norwegian
Nasjonalbiblioteket

Norwegian National Library Whisper fine-tunes for Bokmål and Nynorsk.

OSS · free
CUHK Cantonese ASR
CUHK Speech Group

Chinese University of Hong Kong — Cantonese speech research and open checkpoints.

OSS · free
Pipecat
Daily.co

Open-source framework for voice and multimodal conversational AI agents.

OSS · free
LiveKit Agents
LiveKit

Open-source framework for building realtime AI voice agents on LiveKit's WebRTC stack.

OSS · free
Rasa Voice
Rasa

Open-source conversational AI framework with voice channel integration.

OSS · free
Botpress Voice
Botpress

Open-core conversational AI platform with voice channels.

OSS · free see vendor pricing
5ire Voice
5ire

Open-source desktop client routing voice to LLM voice agents.

OSS · free
Willow
Willow

Open-source privacy-respecting voice assistant for home automation.

OSS · free
TEN Framework
Agora

Open-source framework by Agora for building realtime multimodal voice AI agents.

OSS · free
Moshi
Kyutai

Kyutai's open speech-to-speech foundation model and demo voice agent.

OSS · free
Vocode (OSS)
Vocode

Open-source Python library for building real-time voice-LLM applications.

OSS · free
Coqui XTTS-v2
Coqui (community fork)

Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.

OSS · free
Tortoise-TTS-Fast
Community (152334H)

Performance fork of Tortoise — quality kept, latency 5-10x lower.

OSS · free
Open edX (Self-Hosted)
Axim Collaborative

Self-hosted open-source MOOC platform with caption-track support.

OSS · free
LibriSpeech
Vassil Panayotov / Daniel Povey / JHU CLSP

1000h read English audiobook corpus — the canonical ASR benchmark since 2015.

OSS · free
Libri-Light
Meta AI / Facebook AI Research

60k hours of unlabeled English audiobook audio for self-supervised pretraining.

OSS · free
Mozilla Common Voice
Mozilla Foundation

Crowd-sourced multilingual speech corpus — 30k+ hours across 130 languages.

OSS · free
TED-LIUM 3
LIUM (Le Mans University)

452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.

OSS · free
VoxPopuli
Meta AI / Facebook AI Research

400k hours of European Parliament speeches in 23 EU languages.

OSS · free
Multilingual LibriSpeech (MLS)
Meta AI / Facebook AI Research

44.5k hours of read multilingual audiobook speech across 8 European languages.

OSS · free
MuST-C
FBK (Fondazione Bruno Kessler)

TED-based English→X speech translation corpus across 14 target languages.

OSS · free
CoVoST 2
Meta AI

Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.

OSS · free
FLEURS
Google Research

Few-shot multilingual evaluation across 102 languages — n-way parallel speech.

OSS · free
ML-SUPERB
Academic consortium (CMU + NTU + JHU + others)

Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.

OSS · free
SUPERB
Academic consortium (NTU + CMU + JHU + Meta)

Speech processing Universal PERformance Benchmark — 10 English speech tasks.

OSS · free
GigaSpeech
SpeechColab (consortium)

10,000h English ASR corpus — audiobook + podcast + YouTube blend, multiple subsets.

OSS · free research-only
GigaSpeech 2
SpeechColab

30,000h multilingual evolution of GigaSpeech — Thai, Indonesian, Vietnamese launch.

OSS · free research-only
The People's Speech
MLCommons

30,000h CC-BY-licensed English ASR corpus — Internet-Archive sourced.

OSS · free
YODAS
CMU / WAVLab

500kh of YouTube speech across 100+ languages with CC-licensed subtitles.

OSS · free research-only
YODAS2
CMU / WAVLab

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

OSS · free research-only
SPGISpeech
Kensho Technologies (S&P Global)

5000h of professionally-transcribed earnings-call audio — financial-domain ASR.

OSS · free research-only
Earnings-22
Rev.com / Rev.ai

125h earnings-call ASR test set with 27-accent speaker coverage.

OSS · free
AMI Meeting Corpus
Idiap / Edinburgh / Brno

100h multi-microphone meeting recordings with diarization + speaker labels.

OSS · free
ICSI Meeting Corpus
ICSI Berkeley

72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.

OSS · free
CHiME-6
CHiME Challenge organizers

Real-world dinner-party recordings — far-field ASR + diarization in noise.

OSS · free research-only
CHiME-7 / CHiME-8 DASR
CHiME Challenge organizers

Distant-mic ASR challenge — multi-channel meeting transcription frontier.

OSS · free research-only
VoxCeleb 1
Oxford VGG

100k utterances of celebrity speech from YouTube — speaker recognition benchmark.

OSS · free research-only
VoxCeleb 2
Oxford VGG

1M utterances of celebrity speech — scaled-up speaker recognition corpus.

OSS · free research-only
VoxConverse
Oxford VGG

50h audio-visual diarization corpus — wild YouTube speakers in conversation.

OSS · free research-only
DIHARD III
LDC / DIHARD organizers

Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.

OSS · free paid
Switchboard-1
LDC

260h of conversational US English telephone speech — historical ASR benchmark.

OSS · free paid
Fisher English
LDC

2000h of telephone conversations — scaled-up successor to Switchboard.

OSS · free paid
CallHome English
LDC

60h of unscripted home-telephone conversations — diarization + ASR benchmark.

OSS · free paid
Wall Street Journal (WSJ)
LDC

80h of read newspaper sentences — foundational read-speech ASR corpus from 1992.

OSS · free paid
TIMIT
LDC / NIST / Texas Instruments + MIT

Phonetically-balanced 5h read-speech corpus from 1986 — phoneme recognition benchmark.

OSS · free paid
AISHELL-1
Beijing Shell Shell Technology

178h Mandarin read-speech corpus — open Chinese ASR baseline.

OSS · free
AISHELL-2
Beijing Shell Shell Technology

1000h Mandarin read-speech corpus — scaled-up successor.

OSS · free research-only
AISHELL-4
Beijing Shell Shell Technology

120h Mandarin meeting corpus — multi-speaker conference-room scenarios.

OSS · free
KsponSpeech
AI Hub Korea / ETRI

1000h Korean spontaneous-speech corpus — the open KR ASR baseline.

OSS · free research-only
ReazonSpeech
Reazon Holdings

Japanese ASR corpus — 35k hours of TV recordings with captions.

OSS · free research-only
JTubeSpeech
Saruwatari Lab (U. Tokyo)

Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.

OSS · free research-only
JVS Corpus
Saruwatari Lab (U. Tokyo)

30h Japanese versatile multi-speaker corpus — TTS + speaker-modeling baseline.

OSS · free research-only
VCTK
Edinburgh CSTR

44h multi-speaker English corpus — 109 speakers across global accents for TTS.

OSS · free
LJ Speech
Keith Ito

24h single-speaker English audiobook corpus — the canonical TTS baseline.

OSS · free
IEMOCAP
USC SAIL Lab

12h dyadic emotional speech corpus — the gold-standard SER benchmark.

OSS · free research-only
RAVDESS
Ryerson University

Audio-visual emotional speech + song corpus — open SER benchmark.

OSS · free
MELD
SenticNet group / NUS

Multimodal emotion corpus from Friends TV show — conversational emotion recognition.

OSS · free research-only
CREMA-D
CMU / Penn

7442 audio-visual emotional speech clips from 91 actors — open SER corpus.

OSS · free
MUSAN
JHU CLSP

109h corpus of music + speech + noise — augmentation backbone for ASR/SV.

OSS · free
RIRs and Noises (SLR28)
JHU CLSP

Room impulse responses + isotropic noises — reverberation augmentation set.

OSS · free
OpenSLR (catalog)
Daniel Povey (JHU CLSP)

Open Speech and Language Resources — the index of 130+ free speech corpora.

OSS · free
VoxLingua107
Tallinn University of Technology

6.6kh language-identification corpus — 107 languages from YouTube.

OSS · free
Fluent Speech Commands
Fluent.ai

30h spoken-language-understanding corpus — intent classification benchmark.

OSS · free research-only
Google Speech Commands
Google / TensorFlow

1s keyword-spotting corpus — 35 single-word commands, ~100k utterances.

OSS · free
Spoken Wikipedia Corpora
University of Bielefeld

Long-form Wikipedia audiobook recordings in English / German / Dutch — ~1000h.

OSS · free
MGB Challenge
BBC + academic consortium

BBC broadcast-media ASR + diarization challenge — multi-year evaluation series.

OSS · free research-only
This American Life Podcast Transcripts
Mao et al. (academic)

Long-form podcast ASR + speaker-role corpus.

OSS · free research-only
Spotify Podcast Dataset (100K)
Spotify Research

100k hours of English podcasts with metadata — TREC podcast evaluation corpus.

OSS · free research-only
PRESTO
Google Research

Multilingual conversational SLU dataset — 6 languages with disfluencies + code-switching.

OSS · free
VoxTube
ID R&D

5kh weakly-labeled multilingual TTS corpus from YouTube — 50 languages.

OSS · free research-only
Yesno (SLR-1)
OpenSLR

Toy 60-utterance Hebrew corpus — the Kaldi 'hello world' dataset.

OSS · free
AI4Bharat IndicVoices
AI4Bharat (IIT Madras)

16kh Indic-language ASR corpus across 22 Indian languages.

OSS · free
Kathbath
AI4Bharat (IIT Madras)

1684h read-speech ASR benchmark across 12 Indian languages.

OSS · free
IndicSUPERB
AI4Bharat (IIT Madras)

Indic-language version of SUPERB — 12 languages × 6 speech tasks.

OSS · free
Shrutilipi
AI4Bharat (IIT Madras)

6457h Indic-language ASR corpus from All India Radio news broadcasts.

OSS · free
Russian Open STT
Silero

20kh Russian ASR corpus — the largest open Russian-language speech dataset.

OSS · free
VoxForge
VoxForge community

Crowd-sourced multilingual read-speech corpus — the open-source pre-Common-Voice corpus.

OSS · free
VIVOS
AILAB VNU-HCM

15h Vietnamese read-speech ASR corpus — the open Vietnamese ASR baseline.

OSS · free
Thai THAI-SER
VISTEC / NECTEC

36h Thai emotional-speech corpus — the open Thai SER + ASR baseline.

OSS · free
Open ASR Leaderboard
HuggingFace

HuggingFace ASR leaderboard — public WER + RTFx across 8 English test sets.

OSS · free
Papers With Code · Speech Recognition
Papers With Code / Meta AI

Aggregated ASR leaderboards across 100+ benchmarks + papers + code.

OSS · free
AI Hub Korea
NIA (Korean National Information Society Agency)

Korean government open-data hub for speech + NLP corpora — 30+ speech datasets.

OSS · free research-only
NIST SRE Series
NIST

NIST Speaker Recognition Evaluation — the canonical SV/SD benchmark series.

OSS · free paid
NIST OpenSAT
NIST

Open Speech Analytic Technologies — noise-robust ASR + KWS + SAD challenge.

OSS · free paid
Europarl-ST
MLLP / UPV

Speech-translation corpus from European Parliament across 9 languages.

OSS · free
IARPA Babel
IARPA / LDC

Low-resource multilingual ASR + KWS corpora — 25+ languages from telephony.

OSS · free paid
JHU CLSP
Johns Hopkins University

Johns Hopkins Center for Language and Speech Processing — Kaldi + LibriSpeech + Sherpa origins.

OSS · free
Brno BUT Speech
Brno University of Technology

Brno University of Technology speech group — DIHARD + x-vector + WeSpeaker origins.

OSS · free
Edinburgh CSTR
University of Edinburgh

Centre for Speech Technology Research — VCTK + Merlin TTS + Festival origins.

OSS · free
CMU LTI
Carnegie Mellon University

Carnegie Mellon Language Technologies Institute — Sphinx + ESPnet + YODAS origins.

OSS · free
MIT SLS
MIT CSAIL

MIT Spoken Language Systems Group — TIMIT + Galaxy + Jupiter origins.

OSS · free
NTU Speech Processing Lab
National Taiwan University

National Taiwan University Speech Lab — S3PRL + SUPERB origins.

OSS · free
Meta FAIR Speech
Meta AI / FAIR

Meta AI speech research — wav2vec 2.0 + HuBERT + MMS + Seamless origins.

OSS · free
Google Speech Research
Google Research

Google Research Speech — USM + Chirp + AudioPaLM + FLEURS origins.

OSS · free
NVIDIA Speech AI
NVIDIA

NVIDIA Speech Research — NeMo + Canary + Parakeet + Riva origins.

OSS · free
AI4Bharat
IIT Madras

IIT Madras Indic AI lab — IndicVoices + Kathbath + IndicSUPERB + IndicWav2Vec.

OSS · free
Inria MULTISPEECH
Inria Nancy

Inria Nancy speech research team — diarization + speech enhancement leaders.

OSS · free
LIMSI / LISN / CNRS
CNRS / Paris-Saclay

French national speech-tech lab — TC-STAR + Quaero + ELRA-LDC origins.

OSS · free
RWTH i6
RWTH Aachen University

RWTH Aachen i6 group — RASR toolkit + IWSLT speech translation history.

OSS · free
ICSI Berkeley
ICSI / UC Berkeley

International Computer Science Institute — ICSI Meeting Corpus + Aurora origins.

OSS · free
MERL Speech
Mitsubishi Electric Research Labs

Mitsubishi Electric Research Labs Speech Group — CHiME + speech-enhancement leaders.

OSS · free
MLCommons Speech
MLCommons

MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.

OSS · free
IWSLT Speech Translation
IWSLT organizers (academic consortium)

International Workshop on Spoken Language Translation — annual ST evaluation.

OSS · free
HuggingFace Datasets · Audio
HuggingFace

Hub of 5000+ audio + speech datasets — the modern catalog after OpenSLR.

OSS · free
Coqui XTTS
Coqui

Open-source multilingual TTS with zero-shot voice cloning.

OSS · free free (CPML license, non-commercial without separate license)
Bark (Suno)
Suno

Open-source generative audio model from Suno — speech, music, and sound effects.

OSS · free free (MIT)
Tortoise TTS
neonbjb

Open-source neural TTS with strong prosody and voice cloning.

OSS · free free (Apache-2.0)
OpenVoice (MyShell)
MyShell

MyShell's open-source voice cloning with tone-color extraction.

OSS · free free (MIT for V1, commercial-allowed for V2)
MeloTTS
MyShell

High-quality multi-lingual TTS from MyShell — fast and CPU-friendly.

OSS · free free (MIT)
VITS
jaywalnut310 (research)

End-to-end TTS with adversarial training — the open-source workhorse.

OSS · free free (MIT)
FastSpeech 2
Microsoft Research / community

Non-autoregressive TTS reference implementation — fast and parallelizable.

OSS · free free (MIT)
ESPnet TTS
ESPnet

ESPnet's TTS recipes — multi-architecture, multi-language.

OSS · free free (Apache-2.0)
Mimic 3 (Mycroft)
Mycroft (archived)

Mycroft's neural TTS — designed for Raspberry Pi voice assistants.

OSS · free free (AGPL-3.0)
Larynx
Rhasspy

Rhasspy's predecessor TTS — Tacotron-style models for offline assistants.

OSS · free free (MIT)
Piper (Rhasspy)
Rhasspy

Fast, on-device neural TTS optimized for Raspberry Pi 4.

OSS · free free (MIT)
Festival Speech Synthesis
University of Edinburgh / CMU

Classic Edinburgh / CMU concatenative TTS — academic reference.

OSS · free free (university open-source license)
eSpeak NG
eSpeak NG community

Compact open-source TTS for 100+ languages — the embedded workhorse.

OSS · free free (GPL-3.0)
MaryTTS
DFKI

Java-based open-source TTS platform — research and academic deployments.

OSS · free free (LGPL)
MBROLA
Mons University / open source

Diphone-based TTS engine — paired with eSpeak NG for more natural output.

OSS · free free (AGPL since 2018)
Tacotron 2
Google / NVIDIA reference

Google's seminal end-to-end TTS architecture — the neural-TTS starting point.

OSS · free free (BSD-3-Clause)
Grad-TTS
Huawei Noah's Ark Lab

Diffusion-probabilistic TTS reference implementation.

OSS · free free (MIT)
FastPitch
NVIDIA

NVIDIA's parallel TTS architecture with explicit pitch control.

OSS · free free (BSD-3-Clause)
Kokoro TTS
hexgrad (research)

Lightweight 82M-param open-source TTS — Apache-2.0, runs on a Raspberry Pi.

OSS · free free (Apache-2.0)
Chatterbox TTS
Resemble AI

Resemble AI's open-source emotion-aware TTS — community-licensed.

OSS · free free (MIT)
WaveNet (reference)
DeepMind / community

DeepMind's seminal 2016 neural-vocoder paper — historical reference only.

OSS · free free (community reproductions, varied licenses)
HiFi-GAN (reference)
Jungil Kong (research)

GAN-based neural vocoder reference — fast and high-quality.

OSS · free free (MIT)
MARS5
Camb.ai

Camb.ai's open-source MARS5 multilingual TTS reference.

OSS · free free (AGPL-3.0)
Amphion
Open Multimedia AI Lab

Open-source toolkit for audio, music, and speech generation.

OSS · free free (MIT)
IndexTTS
Bilibili

Bilibili's open-source TTS — Chinese + English bilingual.

OSS · free free (Apache-2.0 code, custom weight license)
Mycroft AI
Mycroft AI · OpenVoiceOS community

Open-source voice assistant — community-forked after the original company wound down.

OSS · free
OpenVoiceOS
OpenVoiceOS community

Community continuation of Mycroft — modular open-source voice assistant for Linux + Pi.

OSS · free
Rhasspy
Rhasspy Voice / Nabu Casa

Fully offline voice assistant for Home Assistant — runs on a Raspberry Pi with no cloud.

OSS · free
Home Assistant Assist
Nabu Casa

Home Assistant's first-party voice surface — Rhasspy's successor, integrated into HA core.

OSS · free free · Nabu Casa cloud $6.50/mo optional
Leon AI
Leon AI community

Open-source personal assistant — self-hostable, privacy-respecting, modular skills.

OSS · free
Whisper Glasses
Whisper Glasses community

Open-source DIY captioning glasses powered by Whisper — community hardware project.

OSS · free free · ~$80 BOM
openWakeWord
openWakeWord contributors

Open-source wake-word engine — community alternative to Porcupine and Snips.

OSS · free
Snowboy
KITT.AI (defunct) · community

Legacy customizable wake-word engine — community-maintained after KITT.AI shutdown.

OSS · free