Open source transcription tools — Whipscribe directory

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

whisperX

Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

OSS · BSD‑2‑Clause ★ 21.4k

insanely-fast-whisper

Vaibhav Srivastav

CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.

OSS · Apache‑2.0 ★ 12.4k

stable-ts

jianfch

Whisper with stabilised timestamps — more accurate word-level timing.

OSS · MIT ★ 2.2k

WhisperKit

Argmax

Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.

OSS · MIT ★ 6.0k

distil-whisper

Hugging Face

Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.

OSS · MIT ★ 4.1k

SeamlessM4T

Meta AI

Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.

OSS · NOASSERTION ★ 11.8k

Vosk

Alpha Cephei

Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.

OSS · Apache‑2.0 ★ 14.6k

Buzz

Chidi Williams

Cross-platform desktop app for Whisper — open-source MacWhisper alternative.

OSS · MIT ★ 18.8k

Tortoise TTS

neonbjb

Open-source TTS model with strong prosody — slow on CPU.

OSS · free

Coqui TTS

Coqui

Open-source TTS toolkit with multi-language voice models.

OSS · free

Whisper JAX

Sanchit Gandhi (HuggingFace)

70x faster Whisper on TPUs via JAX + Flax + batching.

OSS · free

MLX Whisper

Apple ML Research

Whisper inference on Apple Silicon via Apple's MLX framework.

OSS · free

whisper-rs

tazz4843

Idiomatic Rust bindings for whisper.cpp.

OSS · free

Const-me Whisper

Const-me

Whisper running on Windows via DirectCompute / GPGPU.

OSS · free

Whisper Standalone (Purfview)

Purfview

Single-EXE Whisper for Windows + Linux, no dependencies.

OSS · free

whisper-ctranslate2

Softcatalà

Command-line Whisper using CTranslate2 — closest match to openai/whisper CLI.

OSS · free

pywhispercpp

abdeladim-s

Python bindings for whisper.cpp with a simple iterator API.

OSS · free

Whisper-WebUI

jhj0517

Gradio web UI bundling faster-whisper + diarization + translation.

OSS · free

WhisperLive

Collabora

Real-time Whisper transcription over WebSockets.

OSS · free

WhisperFusion

Collabora

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

OSS · free

WhisperBot

Collabora

WhisperFusion's voice-chat reference app.

OSS · free

whisper_streaming

ÚFAL (Charles University)

Academic real-time Whisper streaming with LocalAgreement-2.

OSS · free

whisper-timestamped

LINTO / Linagora

Word-level timestamps for OpenAI Whisper without retraining.

OSS · free

whisper-diarization

Mahmoud Ashraf

Whisper + NeMo MSDD diarization pipeline.

OSS · free

faster-whisper-server

fedirz

OpenAI-compatible /v1/audio/transcriptions endpoint over faster-whisper.

OSS · free

whisper-asr-webservice

Ahmet Öner

Dockerized Whisper REST API with multiple backends.

OSS · free

WhisperS2T

shashikg

Optimized batched Whisper engine with VAD + dynamic batching.

OSS · free

whisper-playground

Sahar Mor

Mic-in-browser → real-time Whisper transcription demo.

OSS · free

LiveWhisper

Nikorasu

Always-listening hot-mic Whisper transcriber.

OSS · free

generate-subtitles

mayeaux

Single-page web UI to generate subtitles via Whisper.

OSS · free

WhisperSpeech

WhisperSpeech / Collabora

Whisper inverted into a TTS — also used as ASR-aware training data tool.

OSS · free

Echogarden

Echogarden Project

Easy-to-use speech toolkit: TTS, STT, alignment, language detection.

OSS · free

Voice-Pro

abus-aikorea

One-click Whisper + diarization + voice cloning Gradio app.

OSS · free

CrisperWhisper

Nyra Health

Whisper retrained for medical / clinical transcription accuracy.

OSS · free

NVIDIA NeMo

NVIDIA

Toolkit + model zoo behind Canary, Parakeet, Conformer, FastConformer.

OSS · free

Seamless (SeamlessM4T family)

Meta AI

Meta's multilingual speech-translation + transcription foundation suite.

OSS · free

Fairseq

Meta AI

Meta's seq-to-seq toolkit — home of wav2vec, HuBERT, XLS-R, MMS.

OSS · free

HuggingFace Transformers (Audio)

Hugging Face

One API for Whisper, Wav2Vec2, HuBERT, XLS-R, SeamlessM4T, Parakeet.

OSS · free

HuggingFace Optimum

Hugging Face

ONNX + TensorRT + OpenVINO acceleration for Transformers ASR models.

OSS · free

HuggingFace Accelerate

Hugging Face

Multi-GPU / mixed-precision launcher for any PyTorch ASR training script.

OSS · free

HuggingFace Datasets

Hugging Face

Streaming loader for Common Voice, LibriSpeech, GigaSpeech, FLEURS.

OSS · free

HuggingFace PEFT

Hugging Face

LoRA / adapters for parameter-efficient Whisper fine-tuning.

OSS · free

SpeechBrain

SpeechBrain consortium

PyTorch toolkit for ASR, speaker, diarization, enhancement.

OSS · free

ESPnet

ESPnet community

End-to-end speech toolkit: ASR, TTS, ST, speaker, separation.

OSS · free

Kaldi

Kaldi community

The classic C++ HMM/DNN speech recognition toolkit.

OSS · free

k2

k2-fsa

FSA/FST framework written from scratch in PyTorch/CUDA.

OSS · free

icefall

k2-fsa

ASR recipes (Conformer / Zipformer / Pruned Transducer) for k2 + sherpa.

OSS · free

Sherpa

k2-fsa

Production server for k2/icefall + Whisper models (PyTorch).

OSS · free

sherpa-onnx

k2-fsa

ONNX-runtime ASR: Whisper, Zipformer, Paraformer on every platform.

OSS · free

sherpa-ncnn

k2-fsa

ASR on NCNN — Android-friendly, CPU-only, no FP support needed.

OSS · free

Coqui STT

Coqui

Successor to Mozilla DeepSpeech, maintained by Coqui.

OSS · free

Mozilla DeepSpeech

Mozilla

The original open RNN-T from Mozilla — archived but historic.

OSS · free

WeNet

wenet-e2e

Production-first E2E ASR — U2++ Conformer, streaming + offline.

OSS · free

Athena STT

ATHENA team

End-to-end speech recognition toolkit by ATHENA-OPEN-SOURCE.

OSS · free

PocketSphinx

CMU Sphinx

Lightweight CMU Sphinx engine for embedded keyword spotting.

OSS · free

CMU Sphinx-4

CMU Sphinx

The classic Java speech engine from CMU.

OSS · free

PaddleSpeech

Baidu

Baidu's all-in-one speech toolkit on PaddlePaddle.

OSS · free

PaddlePaddle DeepSpeech

Baidu

The DeepSpeech-style recipes inside PaddleSpeech.

OSS · free

Julius

Julius project

Lightweight Japanese-focused open ASR with WFST decoding.

OSS · free

Montreal Forced Aligner

MontrealCorpusTools

Word-level alignment via Kaldi for 100+ languages.

OSS · free

OpenSeq2Seq

NVIDIA

NVIDIA's TF1 framework — historical home of Jasper + QuartzNet.

OSS · free

RETURNN

RWTH Aachen

RWTH's flexible neural network training framework for ASR research.

OSS · free

FunASR

Alibaba DAMO Academy

Alibaba DAMO's Paraformer / SenseVoice / Whisper toolkit.

OSS · free

Reverb (Rev.com OSS)

Rev.com

Rev's open WFST-decoded ASR + diarization stack.

OSS · free

fstalign

Rev.com

Rev.com's WER + alignment scoring tool over WFSTs.

OSS · free

PaddlePaddle Parakeet (TTS)

Baidu

Baidu's TTS half — included for end-to-end voice pipelines.

OSS · free

pyannote.audio

pyannote / Hervé Bredin

The reference open diarization + speaker embedding toolkit.

OSS · free

pyannote-audio (legacy)

hbredin

Hervé Bredin's personal mirror of pyannote.audio.

OSS · free

Silero VAD

snakers4 / Silero

Tiny, accurate voice-activity-detection model — runs on CPU.

OSS · free

py-webrtcvad

wiseman

Python bindings for Google's WebRTC VAD.

OSS · free

diart

Juan Manuel Coria

Streaming speaker diarization on top of pyannote.

OSS · free

simple_diarizer

cvqluu

A minimal pyannote / SpeechBrain diarization wrapper.

OSS · free

Resemblyzer

Resemble AI

Speaker-verification embeddings from a small generalist encoder.

OSS · free

Moonshine

Useful Sensors

Tiny English ASR optimized for resource-constrained devices.

OSS · free

Moonshine (project mirror)

Moonshine AI

Mirror of Useful Sensors' Moonshine releases.

OSS · free

Transformers.js

Hugging Face

Run Whisper / wav2vec2 entirely in the browser via ONNX Runtime Web.

OSS · free

Transformers.js (xenova mirror)

Joshua Lochner

Original transformers.js repo by Joshua Lochner (pre-merge into HF).

OSS · free

Apple MLX

Apple

Apple's array framework — runs Whisper, Phi, Llama on Apple Silicon.

OSS · free

MLX Swift

Apple

Swift bindings for MLX — embed Whisper in iOS/macOS apps.

OSS · free

MLX Swift Examples

Apple

Reference Swift apps for MLX, including Whisper.

OSS · free

MLX Examples

Apple

Python MLX examples — Whisper, Llama, Stable Diffusion.

OSS · free

MLX Data

Apple

Audio + image data loaders for MLX training.

OSS · free

HuggingFace Candle

Hugging Face

Minimalist Rust ML framework with Whisper support.

OSS · free

ggml

ggml.ai

The tensor library underneath llama.cpp + whisper.cpp.

OSS · free

whisper.cpp (ggml-org)

ggml.ai

The new ggml-org home of whisper.cpp.

OSS · free

llama.cpp

ggml.ai

GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).

OSS · free

ONNX Runtime

Microsoft

Microsoft's cross-platform inference runtime for ONNX-exported Whisper.

OSS · free

ONNX

Linux Foundation

Open exchange format used by every ASR optimizer.

OSS · free

OpenVINO

Intel

Intel's CPU/iGPU/NPU inference toolkit — Whisper-tuned.

OSS · free

OpenVINO Notebooks

Intel

Reference notebooks including Whisper + SeamlessM4T export.

OSS · free

Intel Extension for PyTorch (IPEX)

Intel

BF16/AMX speedups for Whisper PyTorch inference on Intel CPUs.

OSS · free

vLLM

vLLM Project

High-throughput inference engine — supports Whisper / Llava / Qwen-Audio.

OSS · free

SGLang

SGLang Project

Structured generation runtime — supports Qwen-Audio / Phi-Multimodal.

OSS · free

NVIDIA TensorRT-LLM

NVIDIA

NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.

OSS · free

NVIDIA FasterTransformer

NVIDIA

Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.

OSS · free

HF Text Generation Inference (TGI)

Hugging Face

Production inference server — runs audio-multimodal LLMs.

OSS · free

TensorFlowASR

TensorSpeech

TensorFlow 2 end-to-end ASR — Conformer, ContextNet, DeepSpeech2.

OSS · free

MASR (Mandarin Streaming ASR)

yeyupiaoling

Streaming Conformer + DeepSpeech2 in PyTorch for Mandarin.

OSS · free

AudioClassification-Pytorch

yeyupiaoling

Companion audio-classification training repo for MASR.

OSS · free

speech_recognition (Uberi)

Uberi

Multi-backend Python speech-recognition library.

OSS · free

WeSpeaker

wenet-e2e

Production-style speaker embedding + verification toolkit.

OSS · free

WeSep

wenet-e2e

Open speech-separation toolkit aligned with WeNet ASR.

OSS · free

Flashlight

Meta AI

Meta's C++ ML library — homed wav2letter.

OSS · free

Flashlight Sequence

Meta AI

Standalone CTC / sequence decoders from Flashlight.

OSS · free

wav2letter++

Meta AI

Meta's original fast convolutional ASR system.

OSS · free

conformer (sooftware)

sooftware

Reference PyTorch implementation of the Conformer architecture.

OSS · free

Speech-Transformer (sooftware)

sooftware

Reference Speech-Transformer in PyTorch.

OSS · free

Microsoft UniLM

Microsoft

Home of WavLM, HuBERT++, Speech-T5, BEATs, VALL-E.

OSS · free

Microsoft SpeechT5

Microsoft

Unified speech-text Transformer (ASR + TTS + VC).

OSS · free

Microsoft Recognizers-Text

Microsoft

Post-processing for ASR: numbers, dates, units in 20+ languages.

OSS · free

NVIDIA Riva Python Clients

NVIDIA

Open clients for Riva — NVIDIA's commercial ASR/TTS server.

OSS · free

openai-python

OpenAI

Reference SDK — covers the Whisper + Realtime audio endpoints.

OSS · free

LinTO Platform Stack

LINAGORA

Open conversational-AI stack with self-hosted ASR + NLP.

OSS · free

LinTO Transcription Service

LINAGORA

Production transcription microservice powering the LinTO stack.

OSS · free

fairseq2 (via seamless_communication)

Meta AI

Modular successor to fairseq used by Seamless models.

OSS · free

HuggingFace LightEval

Hugging Face

Eval harness — includes WER evaluations for ASR.

OSS · free

HuggingFace Audio Course

Hugging Face

Free open course on audio ML, including Whisper fine-tuning.

OSS · free

SpeechColab Leaderboard

SpeechColab

Open ASR leaderboard (LibriSpeech, GigaSpeech, AISHELL).

OSS · free

StreamSpeech

ICT-NLP

Simultaneous speech-to-speech translation with streaming ASR.

OSS · free

Parler-TTS

Hugging Face

Open TTS — relevant when pairing ASR with read-back TTS.

OSS · free

Quivr

QuivrHQ

OSS 'second brain' that ingests transcripts via Whisper.

OSS · free

UniAudio

yangdongchao / CUHK

Unified audio foundation model (Codec + LM) — handles ASR.

OSS · free

DeepSpeed

Microsoft

Distributed Whisper / Conformer training at scale.

OSS · free

DeepSpeed (deepspeedai mirror)

DeepSpeed AI

The deepspeedai-org home of DeepSpeed.

OSS · free

DeepSpeed-MII

Microsoft

Microsoft's inference-side companion to DeepSpeed.

OSS · free

Microsoft Olive

Microsoft

Model-optimization toolchain — Whisper ONNX/QNN/DirectML targets.

OSS · free

JAX (Google)

Google

Underlying framework for whisper-jax and TPU ASR research.

OSS · free

JAX (JAX-ML org)

JAX-ML

JAX's new home under the JAX-ML org.

OSS · free

Flax

Google

JAX neural-net library used by whisper-jax.

OSS · free

SentencePiece

Google

Subword tokenizer used by Whisper, SeamlessM4T, Canary.

OSS · free

TensorFlow

Google

The framework underlying TensorFlowASR + many older recipes.

OSS · free

TensorFlow Text

Google

Text ops + tokenizers integrated with TF ASR pipelines.

OSS · free

TensorFlow Lingvo

Google

Google's research-grade TF framework — original Conformer code.

OSS · free

NVIDIA Megatron-LM

NVIDIA

Tensor-parallel training — used for Speech-LLM scaling.

OSS · free

NVIDIA Apex

NVIDIA

Mixed-precision / fused ops library used in NeMo training.

OSS · free

cuDNN Frontend

NVIDIA

C++ / Python API for cuDNN — speeds up custom ASR kernels.

OSS · free

CUTLASS

NVIDIA

High-performance CUDA matrix kernels used by Whisper engines.

OSS · free

ColossalAI

HPC-AI Tech

Open distributed framework — supports Whisper LoRA fine-tunes.

OSS · free

MLC-LLM

MLC AI

Compile + deploy LLMs (and Whisper) to phones / browsers / WebGPU.

OSS · free

MLflow

LF AI / Databricks

Track / serve Whisper experiments and model registry.

OSS · free

lm-evaluation-harness

EleutherAI

Eval harness now covering audio-LLM benchmarks.

OSS · free

Piper

Rhasspy

Fast neural TTS for Home Assistant — pairs with Whisper.

OSS · free

Mozilla TTS

Mozilla

Mozilla's archived TTS — historical reference.

OSS · free

NVIDIA Tacotron2

NVIDIA

Reference Tacotron2 + WaveGlow stack from NVIDIA.

OSS · free

NVIDIA WaveGlow

NVIDIA

Flow-based vocoder companion to Tacotron2.

OSS · free

NVIDIA Mellotron

NVIDIA

Multispeaker prosody TTS — historical NVIDIA release.

OSS · free

IMS Toucan

Universität Stuttgart IMS

Multilingual TTS toolkit from Stuttgart IMS.

OSS · free

IMS Toucan (lowercase mirror)

Universität Stuttgart IMS

Alternate-case mirror of IMS Toucan.

OSS · free

VITS

jaywalnut310

Reference E2E TTS — building block for voice-agent loops.

OSS · free

Glow-TTS

jaywalnut310

Flow-based parallel TTS reference.

OSS · free

Suno Bark

Suno AI

Transformer-based generative audio / TTS.

OSS · free

ChatTTS

2noise

Conversational TTS — voice agent companion to Whisper.

OSS · free

Fish Speech

fishaudio

Open zero-shot voice cloning + TTS.

OSS · free

Bert-VITS2

fishaudio

VITS2 + BERT prosody TTS — companion to Whisper.

OSS · free

StyleTTS2

yl4579

Style-conditioned TTS — pairs with Whisper for narration apps.

OSS · free

StyleTTS

yl4579

Original StyleTTS — predecessor of StyleTTS2.

OSS · free

F5-TTS

SWivid

Flow-matching TTS — open and fast.

OSS · free

MetaVoice 1B

MetaVoice

Open zero-shot voice cloning TTS.

OSS · free

GPT-SoVITS

RVC Boss

Few-shot voice cloning — companion to Whisper-cloned datasets.

OSS · free

RVC WebUI

RVC Project

Real-Time Voice Cloning interface — pairs with Whisper alignment.

OSS · free

AudioCraft

Meta AI

Meta's audio-generation stack (MusicGen, AudioGen, EnCodec).

OSS · free

EnCodec

Meta AI

Neural audio codec — used by SeamlessM4T + many speech-LMs.

OSS · free

EnCodec (capitalized mirror)

Meta AI

Mirror of facebookresearch/encodec.

OSS · free

textlesslib

Meta AI

Speech-without-text framework from Meta.

OSS · free

AudioMAE

Meta AI

Masked-Autoencoder pretrain for audio — feeds downstream ASR.

OSS · free

Descript Audio Codec

Descript

High-quality neural audio codec — alternative to EnCodec.

OSS · free

audiotools

Descript

Audio data tooling library that pairs with DAC.

OSS · free

HuggingFace LeRobot

Hugging Face

Open robotics — includes spoken-command ASR demos.

OSS · free

HuggingFace Diffusers

Hugging Face

Generative-audio diffusion — paired with Whisper for content pipelines.

OSS · free

SetFit

Hugging Face

Few-shot text classifier — useful for post-transcript tagging.

OSS · free

paper-qa

Future-House

RAG over PDFs / transcripts — downstream ASR consumer pattern.

OSS · free

GPT-NeoX

EleutherAI

Training framework for large speech-LMs.

OSS · free

OpenCLIP

mlfoundations

Open CLIP — companion vision encoder in multimodal ASR research.

OSS · free

Salesforce CodeT5

Salesforce

Code-generation T5 — used in voice-coding agents on top of Whisper.

OSS · free

Salesforce CTRL

Salesforce

Conditional-LM — historical companion to speech-text research.

OSS · free

NeMo Guardrails

NVIDIA

Safety layer often paired with Whisper voice agents.

OSS · free

Transformers4Rec

NVIDIA-Merlin

Sequence models — companion to spoken-search recommender pipelines.

OSS · free

Pai Megatron Patch

Alibaba

Alibaba's patched Megatron — used for Paraformer scale-up.

OSS · free

Google Snappy

Google

Compression library used by ASR data pipelines.

OSS · free

google-research monorepo

Google Research

Catch-all for Google ASR papers (USM, BigSSL, Conformer).

OSS · free

Google seq2seq

Google

Historical TF1 seq2seq — early Listen-Attend-Spell era.

OSS · free

llm.c

Andrej Karpathy

Andrej Karpathy's bare-metal C training code — reference for compact ASR.

OSS · free

GFPGAN

TencentARC

Face restoration — often paired with Whisper subtitle pipelines.

OSS · free

AnimateDiff

guoyww

Stable-Diffusion animation — used with Whisper subs in content pipelines.

OSS · free

Llama2-Code-Interpreter

SeungyounShin

Voice-coding agent example over Whisper.

OSS · free

torch-harmonics

NVIDIA

Spherical signal transforms — used in advanced ASR research.

OSS · free

whisper-ctranslate2 (SoftcatalA mirror)

Softcatalà

Capitalized-name mirror of whisper-ctranslate2.

OSS · free

faster-whisper (guillaumekln legacy)

Guillaume Klein

Pre-SYSTRAN home of faster-whisper.

OSS · free

llama.cpp (ggml-org)

ggml.ai

The ggml-org-hosted mirror of llama.cpp.

OSS · free

WeNet (capitalized mirror)

wenet-e2e

Capitalized-name mirror of wenet.

OSS · free

oTranscribe

Elliot Bentley

Free browser-based manual transcription tool — keyboard-shortcut transcript editor.

OSS · free

Talon Dictation Models

Talon Voice community

Open dictation engines used by the Talon Voice community.

OSS · free

Vibe

Thomas Beling

Open-source desktop transcription and dictation app built on Whisper.

OSS · free free, open source

AI4Bharat IndicConformer

AI4Bharat / IIT Madras

Open-source Indic ASR models from IIT Madras' AI4Bharat lab — 22 scheduled Indian languages.

OSS · free

Mozilla Common Voice

Mozilla Foundation

Mozilla Common Voice — public-domain multilingual speech corpus that powers many regional STT models.

OSS · free

Meta MMS

Meta AI

Meta Massively Multilingual Speech — open-source ASR for 1,100+ languages.

OSS · free

ASR-IL

Israeli AI consortium

Israeli national Hebrew ASR — research models from the Israeli AI consortium.

OSS · free

AI4D African Language Dataset

AI4D Africa

AI4D Africa — multilingual African speech datasets and ASR baselines.

OSS · free

Khipu Andean ASR

Khipu / Americas NLP

Khipu community — open-source Andean Spanish, Quechua, and Aymara speech research.

OSS · free

VinAI / VinBigData ASR

VinAI Research

VinAI Research — Vietnamese-language ASR and speech research from the Vingroup AI arm.

OSS · free

Khmer ASR (EKS Labs)

Cambodian research community

Khmer-language speech recognition research for the Cambodian market.

OSS · free

Typhoon ASR (Thai)

SCB 10X

Typhoon — Thai-language LLM and ASR initiative from SCB 10X.

OSS · free

Mesolitica Malay ASR

Mesolitica

Mesolitica — Bahasa Malaysia and Bahasa Indonesia speech research checkpoints.

OSS · free

Georgian ASR (TSU)

Tbilisi State University

Tbilisi State University Georgian speech recognition research.

OSS · free

Armenian ASR (Yerevann)

Yerevann

Yerevann research lab Armenian speech recognition checkpoints.

OSS · free

Turkish ASR (Boğaziçi / METU)

Turkish academic community

Open-source Turkish-language ASR checkpoints from Turkish university labs.

OSS · free

Kencorpus Swahili ASR

Kencorpus consortium

Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.

OSS · free

IIIT-Hyderabad Indic Speech

IIIT Hyderabad

IIIT-Hyderabad speech lab — academic Indian-language ASR datasets and checkpoints.

OSS · free

IIT Madras Speech Lab

IIT Madras

IIT Madras speech group — academic Indian-language ASR research and AI4Bharat home.

OSS · free

IIT Bombay Speech

IIT Bombay

IIT Bombay speech group — Indian-language ASR research and Bhashini contributions.

OSS · free

Akylai Kyrgyz ASR

Akylai community

Akylai project — Kyrgyz-language voice assistant and ASR research.

OSS · free

ISSAI Kazakh ASR

ISSAI / Nazarbayev University

Institute of Smart Systems and AI (Nazarbayev University) — Kazakh-language ASR research.

OSS · free

Telugu Speech Corpus

Indian academic community

Open Telugu-language speech corpora and models for SE-Indian transcription.

OSS · free

Tamil Open ASR

Tamil open-source community

Community-published Tamil-language ASR models and corpora.

OSS · free

BNLP Bangla ASR

Bengali NLP community

Bengali-language ASR datasets and models from the BNLP / Bengali NLP community.

OSS · free

L3Cube Marathi ASR

L3Cube / Pune

L3Cube Pune — Marathi-language NLP and speech research releases.

OSS · free

KB-Whisper Swedish

Kungliga Biblioteket

Kungliga Biblioteket (National Library of Sweden) Whisper fine-tunes for Swedish.

OSS · free

NB-Whisper Norwegian

Nasjonalbiblioteket

Norwegian National Library Whisper fine-tunes for Bokmål and Nynorsk.

OSS · free

CUHK Cantonese ASR

CUHK Speech Group

Chinese University of Hong Kong — Cantonese speech research and open checkpoints.

OSS · free

Pipecat

Daily.co

Open-source framework for voice and multimodal conversational AI agents.

OSS · free

LiveKit Agents

LiveKit

Open-source framework for building realtime AI voice agents on LiveKit's WebRTC stack.

OSS · free

Rasa Voice

Rasa

Open-source conversational AI framework with voice channel integration.

OSS · free

Botpress Voice

Botpress

Open-core conversational AI platform with voice channels.

OSS · free see vendor pricing

5ire Voice

5ire

Open-source desktop client routing voice to LLM voice agents.

OSS · free

Willow

Open-source privacy-respecting voice assistant for home automation.

OSS · free

TEN Framework

Agora

Open-source framework by Agora for building realtime multimodal voice AI agents.

OSS · free

Moshi

Kyutai

Kyutai's open speech-to-speech foundation model and demo voice agent.

OSS · free

Vocode (OSS)

Vocode

Open-source Python library for building real-time voice-LLM applications.

OSS · free

Coqui XTTS-v2

Coqui (community fork)

Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.

OSS · free

Tortoise-TTS-Fast

Community (152334H)

Performance fork of Tortoise — quality kept, latency 5-10x lower.

OSS · free

Open edX (Self-Hosted)

Axim Collaborative

Self-hosted open-source MOOC platform with caption-track support.

OSS · free

LibriSpeech

Vassil Panayotov / Daniel Povey / JHU CLSP

1000h read English audiobook corpus — the canonical ASR benchmark since 2015.

OSS · free

Libri-Light

Meta AI / Facebook AI Research

60k hours of unlabeled English audiobook audio for self-supervised pretraining.

OSS · free

Mozilla Common Voice

Mozilla Foundation

Crowd-sourced multilingual speech corpus — 30k+ hours across 130 languages.

OSS · free

TED-LIUM 3

LIUM (Le Mans University)

452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.

OSS · free

VoxPopuli

Meta AI / Facebook AI Research

400k hours of European Parliament speeches in 23 EU languages.

OSS · free

Multilingual LibriSpeech (MLS)

Meta AI / Facebook AI Research

44.5k hours of read multilingual audiobook speech across 8 European languages.

OSS · free

MuST-C

FBK (Fondazione Bruno Kessler)

TED-based English→X speech translation corpus across 14 target languages.

OSS · free

CoVoST 2

Meta AI

Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.

OSS · free

FLEURS

Google Research

Few-shot multilingual evaluation across 102 languages — n-way parallel speech.

OSS · free

ML-SUPERB

Academic consortium (CMU + NTU + JHU + others)

Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.

OSS · free

SUPERB

Academic consortium (NTU + CMU + JHU + Meta)

Speech processing Universal PERformance Benchmark — 10 English speech tasks.

OSS · free

GigaSpeech

SpeechColab (consortium)

10,000h English ASR corpus — audiobook + podcast + YouTube blend, multiple subsets.

OSS · free research-only

GigaSpeech 2

SpeechColab

30,000h multilingual evolution of GigaSpeech — Thai, Indonesian, Vietnamese launch.

OSS · free research-only

The People's Speech

MLCommons

30,000h CC-BY-licensed English ASR corpus — Internet-Archive sourced.

OSS · free

YODAS

CMU / WAVLab

500kh of YouTube speech across 100+ languages with CC-licensed subtitles.

OSS · free research-only

YODAS2

CMU / WAVLab

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

OSS · free research-only

SPGISpeech

Kensho Technologies (S&P Global)

5000h of professionally-transcribed earnings-call audio — financial-domain ASR.

OSS · free research-only

Earnings-22

Rev.com / Rev.ai

125h earnings-call ASR test set with 27-accent speaker coverage.

OSS · free

AMI Meeting Corpus

Idiap / Edinburgh / Brno

100h multi-microphone meeting recordings with diarization + speaker labels.

OSS · free

ICSI Meeting Corpus

ICSI Berkeley

72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.

OSS · free

CHiME-6

CHiME Challenge organizers

Real-world dinner-party recordings — far-field ASR + diarization in noise.

OSS · free research-only

CHiME-7 / CHiME-8 DASR

CHiME Challenge organizers

Distant-mic ASR challenge — multi-channel meeting transcription frontier.

OSS · free research-only

VoxCeleb 1

Oxford VGG

100k utterances of celebrity speech from YouTube — speaker recognition benchmark.

OSS · free research-only

VoxCeleb 2

Oxford VGG

1M utterances of celebrity speech — scaled-up speaker recognition corpus.

OSS · free research-only

VoxConverse

Oxford VGG

50h audio-visual diarization corpus — wild YouTube speakers in conversation.

OSS · free research-only

DIHARD III

LDC / DIHARD organizers

Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.

OSS · free paid

Switchboard-1

LDC

260h of conversational US English telephone speech — historical ASR benchmark.

OSS · free paid

Fisher English

LDC

2000h of telephone conversations — scaled-up successor to Switchboard.

OSS · free paid

CallHome English

LDC

60h of unscripted home-telephone conversations — diarization + ASR benchmark.

OSS · free paid

Wall Street Journal (WSJ)

LDC

80h of read newspaper sentences — foundational read-speech ASR corpus from 1992.

OSS · free paid

TIMIT

LDC / NIST / Texas Instruments + MIT

Phonetically-balanced 5h read-speech corpus from 1986 — phoneme recognition benchmark.

OSS · free paid

AISHELL-1

Beijing Shell Shell Technology

178h Mandarin read-speech corpus — open Chinese ASR baseline.

OSS · free

AISHELL-2

Beijing Shell Shell Technology

1000h Mandarin read-speech corpus — scaled-up successor.

OSS · free research-only

AISHELL-4

Beijing Shell Shell Technology

120h Mandarin meeting corpus — multi-speaker conference-room scenarios.

OSS · free

KsponSpeech

AI Hub Korea / ETRI

1000h Korean spontaneous-speech corpus — the open KR ASR baseline.

OSS · free research-only

ReazonSpeech

Reazon Holdings

Japanese ASR corpus — 35k hours of TV recordings with captions.

OSS · free research-only

JTubeSpeech

Saruwatari Lab (U. Tokyo)

Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.

OSS · free research-only

JVS Corpus

Saruwatari Lab (U. Tokyo)

30h Japanese versatile multi-speaker corpus — TTS + speaker-modeling baseline.

OSS · free research-only

VCTK

Edinburgh CSTR

44h multi-speaker English corpus — 109 speakers across global accents for TTS.

OSS · free

LJ Speech

Keith Ito

24h single-speaker English audiobook corpus — the canonical TTS baseline.

OSS · free

IEMOCAP

USC SAIL Lab

12h dyadic emotional speech corpus — the gold-standard SER benchmark.

OSS · free research-only

RAVDESS

Ryerson University

Audio-visual emotional speech + song corpus — open SER benchmark.

OSS · free

MELD

SenticNet group / NUS

Multimodal emotion corpus from Friends TV show — conversational emotion recognition.

OSS · free research-only

CREMA-D

CMU / Penn

7442 audio-visual emotional speech clips from 91 actors — open SER corpus.

OSS · free

MUSAN

JHU CLSP

109h corpus of music + speech + noise — augmentation backbone for ASR/SV.

OSS · free

RIRs and Noises (SLR28)

JHU CLSP

Room impulse responses + isotropic noises — reverberation augmentation set.

OSS · free

OpenSLR (catalog)

Daniel Povey (JHU CLSP)

Open Speech and Language Resources — the index of 130+ free speech corpora.

OSS · free

VoxLingua107

Tallinn University of Technology

6.6kh language-identification corpus — 107 languages from YouTube.

OSS · free

Fluent Speech Commands

Fluent.ai

30h spoken-language-understanding corpus — intent classification benchmark.

OSS · free research-only

Google Speech Commands

Google / TensorFlow

1s keyword-spotting corpus — 35 single-word commands, ~100k utterances.

OSS · free

Spoken Wikipedia Corpora

University of Bielefeld

Long-form Wikipedia audiobook recordings in English / German / Dutch — ~1000h.

OSS · free

MGB Challenge

BBC + academic consortium

BBC broadcast-media ASR + diarization challenge — multi-year evaluation series.

OSS · free research-only

This American Life Podcast Transcripts

Mao et al. (academic)

Long-form podcast ASR + speaker-role corpus.

OSS · free research-only

Spotify Podcast Dataset (100K)

Spotify Research

100k hours of English podcasts with metadata — TREC podcast evaluation corpus.

OSS · free research-only

PRESTO

Google Research

Multilingual conversational SLU dataset — 6 languages with disfluencies + code-switching.

OSS · free

VoxTube

ID R&D

5kh weakly-labeled multilingual TTS corpus from YouTube — 50 languages.

OSS · free research-only

Yesno (SLR-1)

OpenSLR

Toy 60-utterance Hebrew corpus — the Kaldi 'hello world' dataset.

OSS · free

AI4Bharat IndicVoices

AI4Bharat (IIT Madras)

16kh Indic-language ASR corpus across 22 Indian languages.

OSS · free

Kathbath

AI4Bharat (IIT Madras)

1684h read-speech ASR benchmark across 12 Indian languages.

OSS · free

IndicSUPERB

AI4Bharat (IIT Madras)

Indic-language version of SUPERB — 12 languages × 6 speech tasks.

OSS · free

Shrutilipi

AI4Bharat (IIT Madras)

6457h Indic-language ASR corpus from All India Radio news broadcasts.

OSS · free

Russian Open STT

Silero

20kh Russian ASR corpus — the largest open Russian-language speech dataset.

OSS · free

VoxForge

VoxForge community

Crowd-sourced multilingual read-speech corpus — the open-source pre-Common-Voice corpus.

OSS · free

VIVOS

AILAB VNU-HCM

15h Vietnamese read-speech ASR corpus — the open Vietnamese ASR baseline.

OSS · free

Thai THAI-SER

VISTEC / NECTEC

36h Thai emotional-speech corpus — the open Thai SER + ASR baseline.

OSS · free

Open ASR Leaderboard

HuggingFace

HuggingFace ASR leaderboard — public WER + RTFx across 8 English test sets.

OSS · free

Papers With Code · Speech Recognition

Papers With Code / Meta AI

Aggregated ASR leaderboards across 100+ benchmarks + papers + code.

OSS · free

AI Hub Korea

NIA (Korean National Information Society Agency)

Korean government open-data hub for speech + NLP corpora — 30+ speech datasets.

OSS · free research-only

NIST SRE Series

NIST

NIST Speaker Recognition Evaluation — the canonical SV/SD benchmark series.

OSS · free paid

NIST OpenSAT

NIST

Open Speech Analytic Technologies — noise-robust ASR + KWS + SAD challenge.

OSS · free paid

Europarl-ST

MLLP / UPV

Speech-translation corpus from European Parliament across 9 languages.

OSS · free

IARPA Babel

IARPA / LDC

Low-resource multilingual ASR + KWS corpora — 25+ languages from telephony.

OSS · free paid

JHU CLSP

Johns Hopkins University

Johns Hopkins Center for Language and Speech Processing — Kaldi + LibriSpeech + Sherpa origins.

OSS · free

Brno BUT Speech

Brno University of Technology

Brno University of Technology speech group — DIHARD + x-vector + WeSpeaker origins.

OSS · free

Edinburgh CSTR

University of Edinburgh

Centre for Speech Technology Research — VCTK + Merlin TTS + Festival origins.

OSS · free

CMU LTI

Carnegie Mellon University

Carnegie Mellon Language Technologies Institute — Sphinx + ESPnet + YODAS origins.

OSS · free

MIT SLS

MIT CSAIL

MIT Spoken Language Systems Group — TIMIT + Galaxy + Jupiter origins.

OSS · free

NTU Speech Processing Lab

National Taiwan University

National Taiwan University Speech Lab — S3PRL + SUPERB origins.

OSS · free

Meta FAIR Speech

Meta AI / FAIR

Meta AI speech research — wav2vec 2.0 + HuBERT + MMS + Seamless origins.

OSS · free

Google Speech Research

Google Research

Google Research Speech — USM + Chirp + AudioPaLM + FLEURS origins.

OSS · free

NVIDIA Speech AI

NVIDIA

NVIDIA Speech Research — NeMo + Canary + Parakeet + Riva origins.

OSS · free

AI4Bharat

IIT Madras

IIT Madras Indic AI lab — IndicVoices + Kathbath + IndicSUPERB + IndicWav2Vec.

OSS · free

Inria MULTISPEECH

Inria Nancy

Inria Nancy speech research team — diarization + speech enhancement leaders.

OSS · free

LIMSI / LISN / CNRS

CNRS / Paris-Saclay

French national speech-tech lab — TC-STAR + Quaero + ELRA-LDC origins.

OSS · free

RWTH i6

RWTH Aachen University

RWTH Aachen i6 group — RASR toolkit + IWSLT speech translation history.

OSS · free

ICSI Berkeley

ICSI / UC Berkeley

International Computer Science Institute — ICSI Meeting Corpus + Aurora origins.

OSS · free

MERL Speech

Mitsubishi Electric Research Labs

Mitsubishi Electric Research Labs Speech Group — CHiME + speech-enhancement leaders.

OSS · free

MLCommons Speech

MLCommons

MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.

OSS · free

IWSLT Speech Translation

IWSLT organizers (academic consortium)

International Workshop on Spoken Language Translation — annual ST evaluation.

OSS · free

HuggingFace Datasets · Audio

HuggingFace

Hub of 5000+ audio + speech datasets — the modern catalog after OpenSLR.

OSS · free

Coqui XTTS

Coqui

Open-source multilingual TTS with zero-shot voice cloning.

OSS · free free (CPML license, non-commercial without separate license)

Bark (Suno)

Suno

Open-source generative audio model from Suno — speech, music, and sound effects.

OSS · free free (MIT)

Tortoise TTS

neonbjb

Open-source neural TTS with strong prosody and voice cloning.

OSS · free free (Apache-2.0)

OpenVoice (MyShell)

MyShell

MyShell's open-source voice cloning with tone-color extraction.

OSS · free free (MIT for V1, commercial-allowed for V2)

MeloTTS

MyShell

High-quality multi-lingual TTS from MyShell — fast and CPU-friendly.

OSS · free free (MIT)

VITS

jaywalnut310 (research)

End-to-end TTS with adversarial training — the open-source workhorse.

OSS · free free (MIT)

FastSpeech 2

Microsoft Research / community

Non-autoregressive TTS reference implementation — fast and parallelizable.

OSS · free free (MIT)

ESPnet TTS

ESPnet

ESPnet's TTS recipes — multi-architecture, multi-language.

OSS · free free (Apache-2.0)

Mimic 3 (Mycroft)

Mycroft (archived)

Mycroft's neural TTS — designed for Raspberry Pi voice assistants.

OSS · free free (AGPL-3.0)

Larynx

Rhasspy

Rhasspy's predecessor TTS — Tacotron-style models for offline assistants.

OSS · free free (MIT)

Piper (Rhasspy)

Rhasspy

Fast, on-device neural TTS optimized for Raspberry Pi 4.

OSS · free free (MIT)

Festival Speech Synthesis

University of Edinburgh / CMU

Classic Edinburgh / CMU concatenative TTS — academic reference.

OSS · free free (university open-source license)

eSpeak NG

eSpeak NG community

Compact open-source TTS for 100+ languages — the embedded workhorse.

OSS · free free (GPL-3.0)

MaryTTS

DFKI

Java-based open-source TTS platform — research and academic deployments.

OSS · free free (LGPL)

MBROLA

Mons University / open source

Diphone-based TTS engine — paired with eSpeak NG for more natural output.

OSS · free free (AGPL since 2018)

Tacotron 2

Google / NVIDIA reference

Google's seminal end-to-end TTS architecture — the neural-TTS starting point.

OSS · free free (BSD-3-Clause)

Grad-TTS

Huawei Noah's Ark Lab

Diffusion-probabilistic TTS reference implementation.

OSS · free free (MIT)

FastPitch

NVIDIA

NVIDIA's parallel TTS architecture with explicit pitch control.

OSS · free free (BSD-3-Clause)

Kokoro TTS

hexgrad (research)

Lightweight 82M-param open-source TTS — Apache-2.0, runs on a Raspberry Pi.

OSS · free free (Apache-2.0)

Chatterbox TTS

Resemble AI

Resemble AI's open-source emotion-aware TTS — community-licensed.

OSS · free free (MIT)

WaveNet (reference)

DeepMind / community

DeepMind's seminal 2016 neural-vocoder paper — historical reference only.

OSS · free free (community reproductions, varied licenses)

HiFi-GAN (reference)

Jungil Kong (research)

GAN-based neural vocoder reference — fast and high-quality.

OSS · free free (MIT)

MARS5

Camb.ai

Camb.ai's open-source MARS5 multilingual TTS reference.

OSS · free free (AGPL-3.0)

Amphion

Open Multimedia AI Lab

Open-source toolkit for audio, music, and speech generation.

OSS · free free (MIT)

IndexTTS

Bilibili

Bilibili's open-source TTS — Chinese + English bilingual.

OSS · free free (Apache-2.0 code, custom weight license)

Mycroft AI

Mycroft AI · OpenVoiceOS community

Open-source voice assistant — community-forked after the original company wound down.

OSS · free

OpenVoiceOS

OpenVoiceOS community

Community continuation of Mycroft — modular open-source voice assistant for Linux + Pi.

OSS · free

Rhasspy

Rhasspy Voice / Nabu Casa

Fully offline voice assistant for Home Assistant — runs on a Raspberry Pi with no cloud.

OSS · free

Home Assistant Assist

Nabu Casa

Home Assistant's first-party voice surface — Rhasspy's successor, integrated into HA core.

OSS · free free · Nabu Casa cloud $6.50/mo optional

Leon AI

Leon AI community

Open-source personal assistant — self-hostable, privacy-respecting, modular skills.

OSS · free

Whisper Glasses

Whisper Glasses community

Open-source DIY captioning glasses powered by Whisper — community hardware project.

OSS · free free · ~$80 BOM

openWakeWord

openWakeWord contributors

Open-source wake-word engine — community alternative to Porcupine and Snips.

OSS · free

Snowboy

KITT.AI (defunct) · community

Legacy customizable wake-word engine — community-maintained after KITT.AI shutdown.

OSS · free