Transcription tools directory

Every audio-to-text tool we track — open-source engines, cloud APIs, meeting notetakers, podcast & video editors, dictation apps, dubbing AI, voice agents, datasets, and more — grouped by category. Search or filter to narrow; click a heading to expand. Curated by Whipscribe; updated 2026-05-15.

Updated 2026-05-15 · 1679 tools tracked
1679 of 1679
Categories
Top tags
No tools match — try clearing filters or rephrasing your search.
Open source · 357 tracked
Self-hostable transcription engines and desktop apps you can run yourself, with source you can read and modify. All open source →
OpenAI Whisper
OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k
whisper.cpp
Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k
faster-whisper
SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k
whisperX
Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

OSS · BSD‑2‑Clause ★ 21.4k
insanely-fast-whisper
Vaibhav Srivastav

CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.

OSS · Apache‑2.0 ★ 12.4k
stable-ts
jianfch

Whisper with stabilised timestamps — more accurate word-level timing.

OSS · MIT ★ 2.2k
WhisperKit
Argmax

Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.

OSS · MIT ★ 6.0k
distil-whisper
Hugging Face

Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.

OSS · MIT ★ 4.1k
SeamlessM4T
Meta AI

Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.

OSS · NOASSERTION ★ 11.8k
Vosk
Alpha Cephei

Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.

OSS · Apache‑2.0 ★ 14.6k
Buzz
Chidi Williams

Cross-platform desktop app for Whisper — open-source MacWhisper alternative.

OSS · MIT ★ 18.8k
Tortoise TTS
neonbjb

Open-source TTS model with strong prosody — slow on CPU.

OSS · free
Coqui TTS
Coqui

Open-source TTS toolkit with multi-language voice models.

OSS · free
Whisper JAX
Sanchit Gandhi (HuggingFace)

70x faster Whisper on TPUs via JAX + Flax + batching.

OSS · free
MLX Whisper
Apple ML Research

Whisper inference on Apple Silicon via Apple's MLX framework.

OSS · free
whisper-rs
tazz4843

Idiomatic Rust bindings for whisper.cpp.

OSS · free
Const-me Whisper
Const-me

Whisper running on Windows via DirectCompute / GPGPU.

OSS · free
Whisper Standalone (Purfview)
Purfview

Single-EXE Whisper for Windows + Linux, no dependencies.

OSS · free
whisper-ctranslate2
Softcatalà

Command-line Whisper using CTranslate2 — closest match to openai/whisper CLI.

OSS · free
pywhispercpp
abdeladim-s

Python bindings for whisper.cpp with a simple iterator API.

OSS · free
Whisper-WebUI
jhj0517

Gradio web UI bundling faster-whisper + diarization + translation.

OSS · free
WhisperLive
Collabora

Real-time Whisper transcription over WebSockets.

OSS · free
WhisperFusion
Collabora

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

OSS · free
WhisperBot
Collabora

WhisperFusion's voice-chat reference app.

OSS · free
whisper_streaming
ÚFAL (Charles University)

Academic real-time Whisper streaming with LocalAgreement-2.

OSS · free
whisper-timestamped
LINTO / Linagora

Word-level timestamps for OpenAI Whisper without retraining.

OSS · free
whisper-diarization
Mahmoud Ashraf

Whisper + NeMo MSDD diarization pipeline.

OSS · free
faster-whisper-server
fedirz

OpenAI-compatible /v1/audio/transcriptions endpoint over faster-whisper.

OSS · free
whisper-asr-webservice
Ahmet Öner

Dockerized Whisper REST API with multiple backends.

OSS · free
WhisperS2T
shashikg

Optimized batched Whisper engine with VAD + dynamic batching.

OSS · free
whisper-playground
Sahar Mor

Mic-in-browser → real-time Whisper transcription demo.

OSS · free
LiveWhisper
Nikorasu

Always-listening hot-mic Whisper transcriber.

OSS · free
generate-subtitles
mayeaux

Single-page web UI to generate subtitles via Whisper.

OSS · free
WhisperSpeech
WhisperSpeech / Collabora

Whisper inverted into a TTS — also used as ASR-aware training data tool.

OSS · free
Echogarden
Echogarden Project

Easy-to-use speech toolkit: TTS, STT, alignment, language detection.

OSS · free
Voice-Pro
abus-aikorea

One-click Whisper + diarization + voice cloning Gradio app.

OSS · free
CrisperWhisper
Nyra Health

Whisper retrained for medical / clinical transcription accuracy.

OSS · free
NVIDIA NeMo
NVIDIA

Toolkit + model zoo behind Canary, Parakeet, Conformer, FastConformer.

OSS · free
Seamless (SeamlessM4T family)
Meta AI

Meta's multilingual speech-translation + transcription foundation suite.

OSS · free
Fairseq
Meta AI

Meta's seq-to-seq toolkit — home of wav2vec, HuBERT, XLS-R, MMS.

OSS · free
HuggingFace Transformers (Audio)
Hugging Face

One API for Whisper, Wav2Vec2, HuBERT, XLS-R, SeamlessM4T, Parakeet.

OSS · free
HuggingFace Optimum
Hugging Face

ONNX + TensorRT + OpenVINO acceleration for Transformers ASR models.

OSS · free
HuggingFace Accelerate
Hugging Face

Multi-GPU / mixed-precision launcher for any PyTorch ASR training script.

OSS · free
HuggingFace Datasets
Hugging Face

Streaming loader for Common Voice, LibriSpeech, GigaSpeech, FLEURS.

OSS · free
HuggingFace PEFT
Hugging Face

LoRA / adapters for parameter-efficient Whisper fine-tuning.

OSS · free
SpeechBrain
SpeechBrain consortium

PyTorch toolkit for ASR, speaker, diarization, enhancement.

OSS · free
ESPnet
ESPnet community

End-to-end speech toolkit: ASR, TTS, ST, speaker, separation.

OSS · free
Kaldi
Kaldi community

The classic C++ HMM/DNN speech recognition toolkit.

OSS · free
k2
k2-fsa

FSA/FST framework written from scratch in PyTorch/CUDA.

OSS · free
icefall
k2-fsa

ASR recipes (Conformer / Zipformer / Pruned Transducer) for k2 + sherpa.

OSS · free
Sherpa
k2-fsa

Production server for k2/icefall + Whisper models (PyTorch).

OSS · free
sherpa-onnx
k2-fsa

ONNX-runtime ASR: Whisper, Zipformer, Paraformer on every platform.

OSS · free
sherpa-ncnn
k2-fsa

ASR on NCNN — Android-friendly, CPU-only, no FP support needed.

OSS · free
Coqui STT
Coqui

Successor to Mozilla DeepSpeech, maintained by Coqui.

OSS · free
Mozilla DeepSpeech
Mozilla

The original open RNN-T from Mozilla — archived but historic.

OSS · free
WeNet
wenet-e2e

Production-first E2E ASR — U2++ Conformer, streaming + offline.

OSS · free
Athena STT
ATHENA team

End-to-end speech recognition toolkit by ATHENA-OPEN-SOURCE.

OSS · free
PocketSphinx
CMU Sphinx

Lightweight CMU Sphinx engine for embedded keyword spotting.

OSS · free
CMU Sphinx-4
CMU Sphinx

The classic Java speech engine from CMU.

OSS · free
PaddleSpeech
Baidu

Baidu's all-in-one speech toolkit on PaddlePaddle.

OSS · free
PaddlePaddle DeepSpeech
Baidu

The DeepSpeech-style recipes inside PaddleSpeech.

OSS · free
Julius
Julius project

Lightweight Japanese-focused open ASR with WFST decoding.

OSS · free
Montreal Forced Aligner
MontrealCorpusTools

Word-level alignment via Kaldi for 100+ languages.

OSS · free
OpenSeq2Seq
NVIDIA

NVIDIA's TF1 framework — historical home of Jasper + QuartzNet.

OSS · free
RETURNN
RWTH Aachen

RWTH's flexible neural network training framework for ASR research.

OSS · free
FunASR
Alibaba DAMO Academy

Alibaba DAMO's Paraformer / SenseVoice / Whisper toolkit.

OSS · free
Reverb (Rev.com OSS)
Rev.com

Rev's open WFST-decoded ASR + diarization stack.

OSS · free
fstalign
Rev.com

Rev.com's WER + alignment scoring tool over WFSTs.

OSS · free
PaddlePaddle Parakeet (TTS)
Baidu

Baidu's TTS half — included for end-to-end voice pipelines.

OSS · free
pyannote.audio
pyannote / Hervé Bredin

The reference open diarization + speaker embedding toolkit.

OSS · free
pyannote-audio (legacy)
hbredin

Hervé Bredin's personal mirror of pyannote.audio.

OSS · free
Silero VAD
snakers4 / Silero

Tiny, accurate voice-activity-detection model — runs on CPU.

OSS · free
py-webrtcvad
wiseman

Python bindings for Google's WebRTC VAD.

OSS · free
diart
Juan Manuel Coria

Streaming speaker diarization on top of pyannote.

OSS · free
simple_diarizer
cvqluu

A minimal pyannote / SpeechBrain diarization wrapper.

OSS · free
Resemblyzer
Resemble AI

Speaker-verification embeddings from a small generalist encoder.

OSS · free
Moonshine
Useful Sensors

Tiny English ASR optimized for resource-constrained devices.

OSS · free
Moonshine (project mirror)
Moonshine AI

Mirror of Useful Sensors' Moonshine releases.

OSS · free
Transformers.js
Hugging Face

Run Whisper / wav2vec2 entirely in the browser via ONNX Runtime Web.

OSS · free
Transformers.js (xenova mirror)
Joshua Lochner

Original transformers.js repo by Joshua Lochner (pre-merge into HF).

OSS · free
Apple MLX
Apple

Apple's array framework — runs Whisper, Phi, Llama on Apple Silicon.

OSS · free
MLX Swift
Apple

Swift bindings for MLX — embed Whisper in iOS/macOS apps.

OSS · free
MLX Swift Examples
Apple

Reference Swift apps for MLX, including Whisper.

OSS · free
MLX Examples
Apple

Python MLX examples — Whisper, Llama, Stable Diffusion.

OSS · free
MLX Data
Apple

Audio + image data loaders for MLX training.

OSS · free
HuggingFace Candle
Hugging Face

Minimalist Rust ML framework with Whisper support.

OSS · free
ggml
ggml.ai

The tensor library underneath llama.cpp + whisper.cpp.

OSS · free
whisper.cpp (ggml-org)
ggml.ai

The new ggml-org home of whisper.cpp.

OSS · free
llama.cpp
ggml.ai

GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).

OSS · free
ONNX Runtime
Microsoft

Microsoft's cross-platform inference runtime for ONNX-exported Whisper.

OSS · free
ONNX
Linux Foundation

Open exchange format used by every ASR optimizer.

OSS · free
OpenVINO
Intel

Intel's CPU/iGPU/NPU inference toolkit — Whisper-tuned.

OSS · free
OpenVINO Notebooks
Intel

Reference notebooks including Whisper + SeamlessM4T export.

OSS · free
Intel Extension for PyTorch (IPEX)
Intel

BF16/AMX speedups for Whisper PyTorch inference on Intel CPUs.

OSS · free
vLLM
vLLM Project

High-throughput inference engine — supports Whisper / Llava / Qwen-Audio.

OSS · free
SGLang
SGLang Project

Structured generation runtime — supports Qwen-Audio / Phi-Multimodal.

OSS · free
NVIDIA TensorRT-LLM
NVIDIA

NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.

OSS · free
NVIDIA FasterTransformer
NVIDIA

Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.

OSS · free
HF Text Generation Inference (TGI)
Hugging Face

Production inference server — runs audio-multimodal LLMs.

OSS · free
TensorFlowASR
TensorSpeech

TensorFlow 2 end-to-end ASR — Conformer, ContextNet, DeepSpeech2.

OSS · free
MASR (Mandarin Streaming ASR)
yeyupiaoling

Streaming Conformer + DeepSpeech2 in PyTorch for Mandarin.

OSS · free
AudioClassification-Pytorch
yeyupiaoling

Companion audio-classification training repo for MASR.

OSS · free
speech_recognition (Uberi)
Uberi

Multi-backend Python speech-recognition library.

OSS · free
WeSpeaker
wenet-e2e

Production-style speaker embedding + verification toolkit.

OSS · free
WeSep
wenet-e2e

Open speech-separation toolkit aligned with WeNet ASR.

OSS · free
Flashlight
Meta AI

Meta's C++ ML library — homed wav2letter.

OSS · free
Flashlight Sequence
Meta AI

Standalone CTC / sequence decoders from Flashlight.

OSS · free
wav2letter++
Meta AI

Meta's original fast convolutional ASR system.

OSS · free
conformer (sooftware)
sooftware

Reference PyTorch implementation of the Conformer architecture.

OSS · free
Speech-Transformer (sooftware)
sooftware

Reference Speech-Transformer in PyTorch.

OSS · free
Microsoft UniLM
Microsoft

Home of WavLM, HuBERT++, Speech-T5, BEATs, VALL-E.

OSS · free
Microsoft SpeechT5
Microsoft

Unified speech-text Transformer (ASR + TTS + VC).

OSS · free
Microsoft Recognizers-Text
Microsoft

Post-processing for ASR: numbers, dates, units in 20+ languages.

OSS · free
NVIDIA Riva Python Clients
NVIDIA

Open clients for Riva — NVIDIA's commercial ASR/TTS server.

OSS · free
openai-python
OpenAI

Reference SDK — covers the Whisper + Realtime audio endpoints.

OSS · free
LinTO Platform Stack
LINAGORA

Open conversational-AI stack with self-hosted ASR + NLP.

OSS · free
LinTO Transcription Service
LINAGORA

Production transcription microservice powering the LinTO stack.

OSS · free
fairseq2 (via seamless_communication)
Meta AI

Modular successor to fairseq used by Seamless models.

OSS · free
HuggingFace LightEval
Hugging Face

Eval harness — includes WER evaluations for ASR.

OSS · free
HuggingFace Audio Course
Hugging Face

Free open course on audio ML, including Whisper fine-tuning.

OSS · free
SpeechColab Leaderboard
SpeechColab

Open ASR leaderboard (LibriSpeech, GigaSpeech, AISHELL).

OSS · free
StreamSpeech
ICT-NLP

Simultaneous speech-to-speech translation with streaming ASR.

OSS · free
Parler-TTS
Hugging Face

Open TTS — relevant when pairing ASR with read-back TTS.

OSS · free
Quivr
QuivrHQ

OSS 'second brain' that ingests transcripts via Whisper.

OSS · free
UniAudio
yangdongchao / CUHK

Unified audio foundation model (Codec + LM) — handles ASR.

OSS · free
DeepSpeed
Microsoft

Distributed Whisper / Conformer training at scale.

OSS · free
DeepSpeed (deepspeedai mirror)
DeepSpeed AI

The deepspeedai-org home of DeepSpeed.

OSS · free
DeepSpeed-MII
Microsoft

Microsoft's inference-side companion to DeepSpeed.

OSS · free
Microsoft Olive
Microsoft

Model-optimization toolchain — Whisper ONNX/QNN/DirectML targets.

OSS · free
JAX (Google)
Google

Underlying framework for whisper-jax and TPU ASR research.

OSS · free
JAX (JAX-ML org)
JAX-ML

JAX's new home under the JAX-ML org.

OSS · free
Flax
Google

JAX neural-net library used by whisper-jax.

OSS · free
SentencePiece
Google

Subword tokenizer used by Whisper, SeamlessM4T, Canary.

OSS · free
TensorFlow
Google

The framework underlying TensorFlowASR + many older recipes.

OSS · free
TensorFlow Text
Google

Text ops + tokenizers integrated with TF ASR pipelines.

OSS · free
TensorFlow Lingvo
Google

Google's research-grade TF framework — original Conformer code.

OSS · free
NVIDIA Megatron-LM
NVIDIA

Tensor-parallel training — used for Speech-LLM scaling.

OSS · free
NVIDIA Apex
NVIDIA

Mixed-precision / fused ops library used in NeMo training.

OSS · free
cuDNN Frontend
NVIDIA

C++ / Python API for cuDNN — speeds up custom ASR kernels.

OSS · free
CUTLASS
NVIDIA

High-performance CUDA matrix kernels used by Whisper engines.

OSS · free
ColossalAI
HPC-AI Tech

Open distributed framework — supports Whisper LoRA fine-tunes.

OSS · free
MLC-LLM
MLC AI

Compile + deploy LLMs (and Whisper) to phones / browsers / WebGPU.

OSS · free
MLflow
LF AI / Databricks

Track / serve Whisper experiments and model registry.

OSS · free
lm-evaluation-harness
EleutherAI

Eval harness now covering audio-LLM benchmarks.

OSS · free
Piper
Rhasspy

Fast neural TTS for Home Assistant — pairs with Whisper.

OSS · free
Mozilla TTS
Mozilla

Mozilla's archived TTS — historical reference.

OSS · free
NVIDIA Tacotron2
NVIDIA

Reference Tacotron2 + WaveGlow stack from NVIDIA.

OSS · free
NVIDIA WaveGlow
NVIDIA

Flow-based vocoder companion to Tacotron2.

OSS · free
NVIDIA Mellotron
NVIDIA

Multispeaker prosody TTS — historical NVIDIA release.

OSS · free
IMS Toucan
Universität Stuttgart IMS

Multilingual TTS toolkit from Stuttgart IMS.

OSS · free
IMS Toucan (lowercase mirror)
Universität Stuttgart IMS

Alternate-case mirror of IMS Toucan.

OSS · free
VITS
jaywalnut310

Reference E2E TTS — building block for voice-agent loops.

OSS · free
Glow-TTS
jaywalnut310

Flow-based parallel TTS reference.

OSS · free
Suno Bark
Suno AI

Transformer-based generative audio / TTS.

OSS · free
ChatTTS
2noise

Conversational TTS — voice agent companion to Whisper.

OSS · free
Fish Speech
fishaudio

Open zero-shot voice cloning + TTS.

OSS · free
Bert-VITS2
fishaudio

VITS2 + BERT prosody TTS — companion to Whisper.

OSS · free
StyleTTS2
yl4579

Style-conditioned TTS — pairs with Whisper for narration apps.

OSS · free
StyleTTS
yl4579

Original StyleTTS — predecessor of StyleTTS2.

OSS · free
F5-TTS
SWivid

Flow-matching TTS — open and fast.

OSS · free
MetaVoice 1B
MetaVoice

Open zero-shot voice cloning TTS.

OSS · free
GPT-SoVITS
RVC Boss

Few-shot voice cloning — companion to Whisper-cloned datasets.

OSS · free
RVC WebUI
RVC Project

Real-Time Voice Cloning interface — pairs with Whisper alignment.

OSS · free
AudioCraft
Meta AI

Meta's audio-generation stack (MusicGen, AudioGen, EnCodec).

OSS · free
EnCodec
Meta AI

Neural audio codec — used by SeamlessM4T + many speech-LMs.

OSS · free
EnCodec (capitalized mirror)
Meta AI

Mirror of facebookresearch/encodec.

OSS · free
textlesslib
Meta AI

Speech-without-text framework from Meta.

OSS · free
AudioMAE
Meta AI

Masked-Autoencoder pretrain for audio — feeds downstream ASR.

OSS · free
Descript Audio Codec
Descript

High-quality neural audio codec — alternative to EnCodec.

OSS · free
audiotools
Descript

Audio data tooling library that pairs with DAC.

OSS · free
HuggingFace LeRobot
Hugging Face

Open robotics — includes spoken-command ASR demos.

OSS · free
HuggingFace Diffusers
Hugging Face

Generative-audio diffusion — paired with Whisper for content pipelines.

OSS · free
SetFit
Hugging Face

Few-shot text classifier — useful for post-transcript tagging.

OSS · free
paper-qa
Future-House

RAG over PDFs / transcripts — downstream ASR consumer pattern.

OSS · free
GPT-NeoX
EleutherAI

Training framework for large speech-LMs.

OSS · free
OpenCLIP
mlfoundations

Open CLIP — companion vision encoder in multimodal ASR research.

OSS · free
Salesforce CodeT5
Salesforce

Code-generation T5 — used in voice-coding agents on top of Whisper.

OSS · free
Salesforce CTRL
Salesforce

Conditional-LM — historical companion to speech-text research.

OSS · free
NeMo Guardrails
NVIDIA

Safety layer often paired with Whisper voice agents.

OSS · free
Transformers4Rec
NVIDIA-Merlin

Sequence models — companion to spoken-search recommender pipelines.

OSS · free
Pai Megatron Patch
Alibaba

Alibaba's patched Megatron — used for Paraformer scale-up.

OSS · free
Google Snappy
Google

Compression library used by ASR data pipelines.

OSS · free
google-research monorepo
Google Research

Catch-all for Google ASR papers (USM, BigSSL, Conformer).

OSS · free
Google seq2seq
Google

Historical TF1 seq2seq — early Listen-Attend-Spell era.

OSS · free
llm.c
Andrej Karpathy

Andrej Karpathy's bare-metal C training code — reference for compact ASR.

OSS · free
GFPGAN
TencentARC

Face restoration — often paired with Whisper subtitle pipelines.

OSS · free
AnimateDiff
guoyww

Stable-Diffusion animation — used with Whisper subs in content pipelines.

OSS · free
Llama2-Code-Interpreter
SeungyounShin

Voice-coding agent example over Whisper.

OSS · free
torch-harmonics
NVIDIA

Spherical signal transforms — used in advanced ASR research.

OSS · free
whisper-ctranslate2 (SoftcatalA mirror)
Softcatalà

Capitalized-name mirror of whisper-ctranslate2.

OSS · free
faster-whisper (guillaumekln legacy)
Guillaume Klein

Pre-SYSTRAN home of faster-whisper.

OSS · free
llama.cpp (ggml-org)
ggml.ai

The ggml-org-hosted mirror of llama.cpp.

OSS · free
WeNet (capitalized mirror)
wenet-e2e

Capitalized-name mirror of wenet.

OSS · free
oTranscribe
Elliot Bentley

Free browser-based manual transcription tool — keyboard-shortcut transcript editor.

OSS · free
Talon Dictation Models
Talon Voice community

Open dictation engines used by the Talon Voice community.

OSS · free
Vibe
Thomas Beling

Open-source desktop transcription and dictation app built on Whisper.

OSS · free free, open source
AI4Bharat IndicConformer
AI4Bharat / IIT Madras

Open-source Indic ASR models from IIT Madras' AI4Bharat lab — 22 scheduled Indian languages.

OSS · free
Mozilla Common Voice
Mozilla Foundation

Mozilla Common Voice — public-domain multilingual speech corpus that powers many regional STT models.

OSS · free
Meta MMS
Meta AI

Meta Massively Multilingual Speech — open-source ASR for 1,100+ languages.

OSS · free
ASR-IL
Israeli AI consortium

Israeli national Hebrew ASR — research models from the Israeli AI consortium.

OSS · free
AI4D African Language Dataset
AI4D Africa

AI4D Africa — multilingual African speech datasets and ASR baselines.

OSS · free
Khipu Andean ASR
Khipu / Americas NLP

Khipu community — open-source Andean Spanish, Quechua, and Aymara speech research.

OSS · free
VinAI / VinBigData ASR
VinAI Research

VinAI Research — Vietnamese-language ASR and speech research from the Vingroup AI arm.

OSS · free
Khmer ASR (EKS Labs)
Cambodian research community

Khmer-language speech recognition research for the Cambodian market.

OSS · free
Typhoon ASR (Thai)
SCB 10X

Typhoon — Thai-language LLM and ASR initiative from SCB 10X.

OSS · free
Mesolitica Malay ASR
Mesolitica

Mesolitica — Bahasa Malaysia and Bahasa Indonesia speech research checkpoints.

OSS · free
Georgian ASR (TSU)
Tbilisi State University

Tbilisi State University Georgian speech recognition research.

OSS · free
Armenian ASR (Yerevann)
Yerevann

Yerevann research lab Armenian speech recognition checkpoints.

OSS · free
Turkish ASR (Boğaziçi / METU)
Turkish academic community

Open-source Turkish-language ASR checkpoints from Turkish university labs.

OSS · free
Kencorpus Swahili ASR
Kencorpus consortium

Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.

OSS · free
IIIT-Hyderabad Indic Speech
IIIT Hyderabad

IIIT-Hyderabad speech lab — academic Indian-language ASR datasets and checkpoints.

OSS · free
IIT Madras Speech Lab
IIT Madras

IIT Madras speech group — academic Indian-language ASR research and AI4Bharat home.

OSS · free
IIT Bombay Speech
IIT Bombay

IIT Bombay speech group — Indian-language ASR research and Bhashini contributions.

OSS · free
Akylai Kyrgyz ASR
Akylai community

Akylai project — Kyrgyz-language voice assistant and ASR research.

OSS · free
ISSAI Kazakh ASR
ISSAI / Nazarbayev University

Institute of Smart Systems and AI (Nazarbayev University) — Kazakh-language ASR research.

OSS · free
Telugu Speech Corpus
Indian academic community

Open Telugu-language speech corpora and models for SE-Indian transcription.

OSS · free
Tamil Open ASR
Tamil open-source community

Community-published Tamil-language ASR models and corpora.

OSS · free
BNLP Bangla ASR
Bengali NLP community

Bengali-language ASR datasets and models from the BNLP / Bengali NLP community.

OSS · free
L3Cube Marathi ASR
L3Cube / Pune

L3Cube Pune — Marathi-language NLP and speech research releases.

OSS · free
KB-Whisper Swedish
Kungliga Biblioteket

Kungliga Biblioteket (National Library of Sweden) Whisper fine-tunes for Swedish.

OSS · free
NB-Whisper Norwegian
Nasjonalbiblioteket

Norwegian National Library Whisper fine-tunes for Bokmål and Nynorsk.

OSS · free
CUHK Cantonese ASR
CUHK Speech Group

Chinese University of Hong Kong — Cantonese speech research and open checkpoints.

OSS · free
Pipecat
Daily.co

Open-source framework for voice and multimodal conversational AI agents.

OSS · free
LiveKit Agents
LiveKit

Open-source framework for building realtime AI voice agents on LiveKit's WebRTC stack.

OSS · free
Rasa Voice
Rasa

Open-source conversational AI framework with voice channel integration.

OSS · free
Botpress Voice
Botpress

Open-core conversational AI platform with voice channels.

OSS · free see vendor pricing
5ire Voice
5ire

Open-source desktop client routing voice to LLM voice agents.

OSS · free
Willow
Willow

Open-source privacy-respecting voice assistant for home automation.

OSS · free
TEN Framework
Agora

Open-source framework by Agora for building realtime multimodal voice AI agents.

OSS · free
Moshi
Kyutai

Kyutai's open speech-to-speech foundation model and demo voice agent.

OSS · free
Vocode (OSS)
Vocode

Open-source Python library for building real-time voice-LLM applications.

OSS · free
Coqui XTTS-v2
Coqui (community fork)

Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.

OSS · free
Tortoise-TTS-Fast
Community (152334H)

Performance fork of Tortoise — quality kept, latency 5-10x lower.

OSS · free
Open edX (Self-Hosted)
Axim Collaborative

Self-hosted open-source MOOC platform with caption-track support.

OSS · free
LibriSpeech
Vassil Panayotov / Daniel Povey / JHU CLSP

1000h read English audiobook corpus — the canonical ASR benchmark since 2015.

OSS · free
Libri-Light
Meta AI / Facebook AI Research

60k hours of unlabeled English audiobook audio for self-supervised pretraining.

OSS · free
Mozilla Common Voice
Mozilla Foundation

Crowd-sourced multilingual speech corpus — 30k+ hours across 130 languages.

OSS · free
TED-LIUM 3
LIUM (Le Mans University)

452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.

OSS · free
VoxPopuli
Meta AI / Facebook AI Research

400k hours of European Parliament speeches in 23 EU languages.

OSS · free
Multilingual LibriSpeech (MLS)
Meta AI / Facebook AI Research

44.5k hours of read multilingual audiobook speech across 8 European languages.

OSS · free
MuST-C
FBK (Fondazione Bruno Kessler)

TED-based English→X speech translation corpus across 14 target languages.

OSS · free
CoVoST 2
Meta AI

Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.

OSS · free
FLEURS
Google Research

Few-shot multilingual evaluation across 102 languages — n-way parallel speech.

OSS · free
ML-SUPERB
Academic consortium (CMU + NTU + JHU + others)

Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.

OSS · free
SUPERB
Academic consortium (NTU + CMU + JHU + Meta)

Speech processing Universal PERformance Benchmark — 10 English speech tasks.

OSS · free
GigaSpeech
SpeechColab (consortium)

10,000h English ASR corpus — audiobook + podcast + YouTube blend, multiple subsets.

OSS · free research-only
GigaSpeech 2
SpeechColab

30,000h multilingual evolution of GigaSpeech — Thai, Indonesian, Vietnamese launch.

OSS · free research-only
The People's Speech
MLCommons

30,000h CC-BY-licensed English ASR corpus — Internet-Archive sourced.

OSS · free
YODAS
CMU / WAVLab

500kh of YouTube speech across 100+ languages with CC-licensed subtitles.

OSS · free research-only
YODAS2
CMU / WAVLab

Refresh of YODAS with long-form audio + per-language sharding — 422k hours.

OSS · free research-only
SPGISpeech
Kensho Technologies (S&P Global)

5000h of professionally-transcribed earnings-call audio — financial-domain ASR.

OSS · free research-only
Earnings-22
Rev.com / Rev.ai

125h earnings-call ASR test set with 27-accent speaker coverage.

OSS · free
AMI Meeting Corpus
Idiap / Edinburgh / Brno

100h multi-microphone meeting recordings with diarization + speaker labels.

OSS · free
ICSI Meeting Corpus
ICSI Berkeley

72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.

OSS · free
CHiME-6
CHiME Challenge organizers

Real-world dinner-party recordings — far-field ASR + diarization in noise.

OSS · free research-only
CHiME-7 / CHiME-8 DASR
CHiME Challenge organizers

Distant-mic ASR challenge — multi-channel meeting transcription frontier.

OSS · free research-only
VoxCeleb 1
Oxford VGG

100k utterances of celebrity speech from YouTube — speaker recognition benchmark.

OSS · free research-only
VoxCeleb 2
Oxford VGG

1M utterances of celebrity speech — scaled-up speaker recognition corpus.

OSS · free research-only
VoxConverse
Oxford VGG

50h audio-visual diarization corpus — wild YouTube speakers in conversation.

OSS · free research-only
DIHARD III
LDC / DIHARD organizers

Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.

OSS · free paid
Switchboard-1
LDC

260h of conversational US English telephone speech — historical ASR benchmark.

OSS · free paid
Fisher English
LDC

2000h of telephone conversations — scaled-up successor to Switchboard.

OSS · free paid
CallHome English
LDC

60h of unscripted home-telephone conversations — diarization + ASR benchmark.

OSS · free paid
Wall Street Journal (WSJ)
LDC

80h of read newspaper sentences — foundational read-speech ASR corpus from 1992.

OSS · free paid
TIMIT
LDC / NIST / Texas Instruments + MIT

Phonetically-balanced 5h read-speech corpus from 1986 — phoneme recognition benchmark.

OSS · free paid
AISHELL-1
Beijing Shell Shell Technology

178h Mandarin read-speech corpus — open Chinese ASR baseline.

OSS · free
AISHELL-2
Beijing Shell Shell Technology

1000h Mandarin read-speech corpus — scaled-up successor.

OSS · free research-only
AISHELL-4
Beijing Shell Shell Technology

120h Mandarin meeting corpus — multi-speaker conference-room scenarios.

OSS · free
KsponSpeech
AI Hub Korea / ETRI

1000h Korean spontaneous-speech corpus — the open KR ASR baseline.

OSS · free research-only
ReazonSpeech
Reazon Holdings

Japanese ASR corpus — 35k hours of TV recordings with captions.

OSS · free research-only
JTubeSpeech
Saruwatari Lab (U. Tokyo)

Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.

OSS · free research-only
JVS Corpus
Saruwatari Lab (U. Tokyo)

30h Japanese versatile multi-speaker corpus — TTS + speaker-modeling baseline.

OSS · free research-only
VCTK
Edinburgh CSTR

44h multi-speaker English corpus — 109 speakers across global accents for TTS.

OSS · free
LJ Speech
Keith Ito

24h single-speaker English audiobook corpus — the canonical TTS baseline.

OSS · free
IEMOCAP
USC SAIL Lab

12h dyadic emotional speech corpus — the gold-standard SER benchmark.

OSS · free research-only
RAVDESS
Ryerson University

Audio-visual emotional speech + song corpus — open SER benchmark.

OSS · free
MELD
SenticNet group / NUS

Multimodal emotion corpus from Friends TV show — conversational emotion recognition.

OSS · free research-only
CREMA-D
CMU / Penn

7442 audio-visual emotional speech clips from 91 actors — open SER corpus.

OSS · free
MUSAN
JHU CLSP

109h corpus of music + speech + noise — augmentation backbone for ASR/SV.

OSS · free
RIRs and Noises (SLR28)
JHU CLSP

Room impulse responses + isotropic noises — reverberation augmentation set.

OSS · free
OpenSLR (catalog)
Daniel Povey (JHU CLSP)

Open Speech and Language Resources — the index of 130+ free speech corpora.

OSS · free
VoxLingua107
Tallinn University of Technology

6.6kh language-identification corpus — 107 languages from YouTube.

OSS · free
Fluent Speech Commands
Fluent.ai

30h spoken-language-understanding corpus — intent classification benchmark.

OSS · free research-only
Google Speech Commands
Google / TensorFlow

1s keyword-spotting corpus — 35 single-word commands, ~100k utterances.

OSS · free
Spoken Wikipedia Corpora
University of Bielefeld

Long-form Wikipedia audiobook recordings in English / German / Dutch — ~1000h.

OSS · free
MGB Challenge
BBC + academic consortium

BBC broadcast-media ASR + diarization challenge — multi-year evaluation series.

OSS · free research-only
This American Life Podcast Transcripts
Mao et al. (academic)

Long-form podcast ASR + speaker-role corpus.

OSS · free research-only
Spotify Podcast Dataset (100K)
Spotify Research

100k hours of English podcasts with metadata — TREC podcast evaluation corpus.

OSS · free research-only
PRESTO
Google Research

Multilingual conversational SLU dataset — 6 languages with disfluencies + code-switching.

OSS · free
VoxTube
ID R&D

5kh weakly-labeled multilingual TTS corpus from YouTube — 50 languages.

OSS · free research-only
Yesno (SLR-1)
OpenSLR

Toy 60-utterance Hebrew corpus — the Kaldi 'hello world' dataset.

OSS · free
AI4Bharat IndicVoices
AI4Bharat (IIT Madras)

16kh Indic-language ASR corpus across 22 Indian languages.

OSS · free
Kathbath
AI4Bharat (IIT Madras)

1684h read-speech ASR benchmark across 12 Indian languages.

OSS · free
IndicSUPERB
AI4Bharat (IIT Madras)

Indic-language version of SUPERB — 12 languages × 6 speech tasks.

OSS · free
Shrutilipi
AI4Bharat (IIT Madras)

6457h Indic-language ASR corpus from All India Radio news broadcasts.

OSS · free
Russian Open STT
Silero

20kh Russian ASR corpus — the largest open Russian-language speech dataset.

OSS · free
VoxForge
VoxForge community

Crowd-sourced multilingual read-speech corpus — the open-source pre-Common-Voice corpus.

OSS · free
VIVOS
AILAB VNU-HCM

15h Vietnamese read-speech ASR corpus — the open Vietnamese ASR baseline.

OSS · free
Thai THAI-SER
VISTEC / NECTEC

36h Thai emotional-speech corpus — the open Thai SER + ASR baseline.

OSS · free
Open ASR Leaderboard
HuggingFace

HuggingFace ASR leaderboard — public WER + RTFx across 8 English test sets.

OSS · free
Papers With Code · Speech Recognition
Papers With Code / Meta AI

Aggregated ASR leaderboards across 100+ benchmarks + papers + code.

OSS · free
AI Hub Korea
NIA (Korean National Information Society Agency)

Korean government open-data hub for speech + NLP corpora — 30+ speech datasets.

OSS · free research-only
NIST SRE Series
NIST

NIST Speaker Recognition Evaluation — the canonical SV/SD benchmark series.

OSS · free paid
NIST OpenSAT
NIST

Open Speech Analytic Technologies — noise-robust ASR + KWS + SAD challenge.

OSS · free paid
Europarl-ST
MLLP / UPV

Speech-translation corpus from European Parliament across 9 languages.

OSS · free
IARPA Babel
IARPA / LDC

Low-resource multilingual ASR + KWS corpora — 25+ languages from telephony.

OSS · free paid
JHU CLSP
Johns Hopkins University

Johns Hopkins Center for Language and Speech Processing — Kaldi + LibriSpeech + Sherpa origins.

OSS · free
Brno BUT Speech
Brno University of Technology

Brno University of Technology speech group — DIHARD + x-vector + WeSpeaker origins.

OSS · free
Edinburgh CSTR
University of Edinburgh

Centre for Speech Technology Research — VCTK + Merlin TTS + Festival origins.

OSS · free
CMU LTI
Carnegie Mellon University

Carnegie Mellon Language Technologies Institute — Sphinx + ESPnet + YODAS origins.

OSS · free
MIT SLS
MIT CSAIL

MIT Spoken Language Systems Group — TIMIT + Galaxy + Jupiter origins.

OSS · free
NTU Speech Processing Lab
National Taiwan University

National Taiwan University Speech Lab — S3PRL + SUPERB origins.

OSS · free
Meta FAIR Speech
Meta AI / FAIR

Meta AI speech research — wav2vec 2.0 + HuBERT + MMS + Seamless origins.

OSS · free
Google Speech Research
Google Research

Google Research Speech — USM + Chirp + AudioPaLM + FLEURS origins.

OSS · free
NVIDIA Speech AI
NVIDIA

NVIDIA Speech Research — NeMo + Canary + Parakeet + Riva origins.

OSS · free
AI4Bharat
IIT Madras

IIT Madras Indic AI lab — IndicVoices + Kathbath + IndicSUPERB + IndicWav2Vec.

OSS · free
Inria MULTISPEECH
Inria Nancy

Inria Nancy speech research team — diarization + speech enhancement leaders.

OSS · free
LIMSI / LISN / CNRS
CNRS / Paris-Saclay

French national speech-tech lab — TC-STAR + Quaero + ELRA-LDC origins.

OSS · free
RWTH i6
RWTH Aachen University

RWTH Aachen i6 group — RASR toolkit + IWSLT speech translation history.

OSS · free
ICSI Berkeley
ICSI / UC Berkeley

International Computer Science Institute — ICSI Meeting Corpus + Aurora origins.

OSS · free
MERL Speech
Mitsubishi Electric Research Labs

Mitsubishi Electric Research Labs Speech Group — CHiME + speech-enhancement leaders.

OSS · free
MLCommons Speech
MLCommons

MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.

OSS · free
IWSLT Speech Translation
IWSLT organizers (academic consortium)

International Workshop on Spoken Language Translation — annual ST evaluation.

OSS · free
HuggingFace Datasets · Audio
HuggingFace

Hub of 5000+ audio + speech datasets — the modern catalog after OpenSLR.

OSS · free
Coqui XTTS
Coqui

Open-source multilingual TTS with zero-shot voice cloning.

OSS · free free (CPML license, non-commercial without separate license)
Bark (Suno)
Suno

Open-source generative audio model from Suno — speech, music, and sound effects.

OSS · free free (MIT)
Tortoise TTS
neonbjb

Open-source neural TTS with strong prosody and voice cloning.

OSS · free free (Apache-2.0)
OpenVoice (MyShell)
MyShell

MyShell's open-source voice cloning with tone-color extraction.

OSS · free free (MIT for V1, commercial-allowed for V2)
MeloTTS
MyShell

High-quality multi-lingual TTS from MyShell — fast and CPU-friendly.

OSS · free free (MIT)
VITS
jaywalnut310 (research)

End-to-end TTS with adversarial training — the open-source workhorse.

OSS · free free (MIT)
FastSpeech 2
Microsoft Research / community

Non-autoregressive TTS reference implementation — fast and parallelizable.

OSS · free free (MIT)
ESPnet TTS
ESPnet

ESPnet's TTS recipes — multi-architecture, multi-language.

OSS · free free (Apache-2.0)
Mimic 3 (Mycroft)
Mycroft (archived)

Mycroft's neural TTS — designed for Raspberry Pi voice assistants.

OSS · free free (AGPL-3.0)
Larynx
Rhasspy

Rhasspy's predecessor TTS — Tacotron-style models for offline assistants.

OSS · free free (MIT)
Piper (Rhasspy)
Rhasspy

Fast, on-device neural TTS optimized for Raspberry Pi 4.

OSS · free free (MIT)
Festival Speech Synthesis
University of Edinburgh / CMU

Classic Edinburgh / CMU concatenative TTS — academic reference.

OSS · free free (university open-source license)
eSpeak NG
eSpeak NG community

Compact open-source TTS for 100+ languages — the embedded workhorse.

OSS · free free (GPL-3.0)
MaryTTS
DFKI

Java-based open-source TTS platform — research and academic deployments.

OSS · free free (LGPL)
MBROLA
Mons University / open source

Diphone-based TTS engine — paired with eSpeak NG for more natural output.

OSS · free free (AGPL since 2018)
Tacotron 2
Google / NVIDIA reference

Google's seminal end-to-end TTS architecture — the neural-TTS starting point.

OSS · free free (BSD-3-Clause)
Grad-TTS
Huawei Noah's Ark Lab

Diffusion-probabilistic TTS reference implementation.

OSS · free free (MIT)
FastPitch
NVIDIA

NVIDIA's parallel TTS architecture with explicit pitch control.

OSS · free free (BSD-3-Clause)
Kokoro TTS
hexgrad (research)

Lightweight 82M-param open-source TTS — Apache-2.0, runs on a Raspberry Pi.

OSS · free free (Apache-2.0)
Chatterbox TTS
Resemble AI

Resemble AI's open-source emotion-aware TTS — community-licensed.

OSS · free free (MIT)
WaveNet (reference)
DeepMind / community

DeepMind's seminal 2016 neural-vocoder paper — historical reference only.

OSS · free free (community reproductions, varied licenses)
HiFi-GAN (reference)
Jungil Kong (research)

GAN-based neural vocoder reference — fast and high-quality.

OSS · free free (MIT)
MARS5
Camb.ai

Camb.ai's open-source MARS5 multilingual TTS reference.

OSS · free free (AGPL-3.0)
Amphion
Open Multimedia AI Lab

Open-source toolkit for audio, music, and speech generation.

OSS · free free (MIT)
IndexTTS
Bilibili

Bilibili's open-source TTS — Chinese + English bilingual.

OSS · free free (Apache-2.0 code, custom weight license)
Mycroft AI
Mycroft AI · OpenVoiceOS community

Open-source voice assistant — community-forked after the original company wound down.

OSS · free
OpenVoiceOS
OpenVoiceOS community

Community continuation of Mycroft — modular open-source voice assistant for Linux + Pi.

OSS · free
Rhasspy
Rhasspy Voice / Nabu Casa

Fully offline voice assistant for Home Assistant — runs on a Raspberry Pi with no cloud.

OSS · free
Home Assistant Assist
Nabu Casa

Home Assistant's first-party voice surface — Rhasspy's successor, integrated into HA core.

OSS · free free · Nabu Casa cloud $6.50/mo optional
Leon AI
Leon AI community

Open-source personal assistant — self-hostable, privacy-respecting, modular skills.

OSS · free
Whisper Glasses
Whisper Glasses community

Open-source DIY captioning glasses powered by Whisper — community hardware project.

OSS · free free · ~$80 BOM
openWakeWord
openWakeWord contributors

Open-source wake-word engine — community alternative to Porcupine and Snips.

OSS · free
Snowboy
KITT.AI (defunct) · community

Legacy customizable wake-word engine — community-maintained after KITT.AI shutdown.

OSS · free
Transcription APIs · 111 tracked
Hosted transcription endpoints you call with an API key — no infrastructure to manage. All transcription apis →
OpenAI Whisper API
OpenAI

Hosted Whisper large-v3 from OpenAI — $0.006 per minute.

$0.006/min
AssemblyAI
AssemblyAI

Universal-2 model + diarization, PII redaction, topic detection, summarization.

from $0.37/hr
Deepgram
Deepgram

Nova-2 model, excellent streaming, strong at conversational audio.

from $0.0043/min
Rev.ai
Rev.ai

The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.

from $0.02/min
Gladia
Gladia

Whisper-based API with diarization, 99-language coverage, pay-per-minute.

from $0.0102/min
Speechmatics
Speechmatics

Enterprise ASR with strong accents and on-prem deployment options.

contact sales
Whipscribe
Neugence

Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.

This is us
Amazon Transcribe
Amazon Web Services

AWS managed speech-to-text with batch + streaming, custom vocabulary, and medical/call-analytics variants.

from $0.0240/min (standard batch, tiered)
Amazon Transcribe Medical
Amazon Web Services

HIPAA-eligible medical-specialty ASR from AWS for clinical conversations and dictation.

from $0.075/min
Azure AI Speech (Speech-to-Text)
Microsoft Azure

Microsoft Azure's managed STT with batch, real-time, custom speech, and conversation transcription.

from $1/hr (standard) and $0.30/hr (batch transcription)
Google Cloud Speech-to-Text
Google Cloud

GCP Speech v2 with Chirp 2 foundation model, batch + streaming, 125+ language variants.

from $0.016/min (v2 standard) / Chirp tiered
Google Chirp / Chirp 2
Google Cloud

Google's universal speech foundation model exposed via Speech-to-Text v2.

per Speech-to-Text v2 pricing (region-tiered)
IBM Watson Speech to Text
IBM

IBM Cloud's managed ASR with on-prem option, custom acoustic + language models.

Lite (free, capped) + Plus tier (~$0.01/min, tiered)
Oracle Cloud AI Speech
Oracle Cloud Infrastructure

OCI managed speech-to-text with batch + real-time and Whisper-based models.

tiered per-minute (see OCI pricing page)
Alibaba Cloud Intelligent Speech Interaction
Alibaba Cloud

Alibaba's managed Chinese-first ASR with batch + real-time and customizable hotwords.

tiered RMB-per-second pricing
Tencent Cloud ASR
Tencent Cloud

Tencent's managed Chinese-first ASR with one-sentence, real-time, and recording-file modes.

tiered RMB-per-second pricing
Baidu Speech
Baidu AI Cloud

Baidu AI Cloud's Chinese-first speech recognition family.

free tier + tiered RMB-per-call
Yandex SpeechKit
Yandex Cloud

Yandex Cloud's managed Russian-first STT + TTS with batch and streaming.

tiered per-second (RUB-denominated)
Sber SaluteSpeech
Sber (Salute)

Sber's Russian-language speech recognition + synthesis platform.

tiered RUB-per-second (see SmartMarket pricing)
Huawei Cloud Speech Interaction Service
Huawei Cloud

Huawei Cloud's managed ASR + TTS with one-sentence, real-time, and long-audio modes.

tiered per-call (China + international regions)
iFlyTek Open Platform Speech
iFlyTek

iFlyTek's market-leading Mandarin ASR family for enterprise and education.

tiered per-day call quotas (RMB)
Volcengine Speech (ByteDance)
Volcengine (ByteDance)

ByteDance's Volcengine speech-to-text platform powering Douyin/CapCut workflows.

tiered per-second (RMB)
Naver Clova Speech
Naver Cloud Platform

Naver Cloud's Korean-first ASR with batch + real-time and speaker diarization.

tiered KRW-per-second
Kakao Speech (Kakao i)
Kakao Enterprise

Kakao Enterprise's Korean speech recognition + synthesis platform.

Contact sales
NTT Communications COTOHA Voice
NTT Communications

NTT Com's Japanese-first STT under the COTOHA AI platform.

tiered JPY-denominated
AISpeech (Sipeed/iflyOS-class)
AISpeech

Chinese embedded ASR specialist for IoT devices and on-device speech.

Contact sales
Soniox
Soniox

Real-time multilingual ASR API with low-latency streaming and code-switching support.

per-minute (see Soniox pricing page)
ElevenLabs Scribe
ElevenLabs

ElevenLabs' speech-to-text API as a counterpart to its TTS, multilingual, word-timestamped.

from $0.40/hr (see ElevenLabs pricing page)
Sieve
Sieve

Video-AI workflow platform with Whisper-based transcription endpoints.

per-second compute (see Sieve pricing)
Replicate (Whisper hosts)
Replicate

Replicate's catalog of community-hosted Whisper variants behind one API.

per-second GPU compute
Modal (ASR endpoints)
Modal Labs

Modal's serverless GPU platform commonly used to host Whisper / faster-whisper as an API.

per-second GPU compute (see Modal pricing)
RunPod (Whisper endpoints)
RunPod

RunPod's GPU cloud commonly used to deploy Whisper / faster-whisper as a serverless endpoint.

per-second GPU compute
fal.ai (Whisper / wizper endpoints)
fal.ai

fal.ai's hosted Whisper-family endpoints — low-latency, pay-per-second.

per-second compute (see fal.ai pricing)
Groq (Whisper endpoints)
Groq

Groq's LPU-based Whisper-large-v3 endpoint — exceptionally low-latency transcription.

from $0.111/hr (whisper-large-v3, batch)
OpenAI Realtime API (STT)
OpenAI

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

per-minute audio in (model-dependent)
Vatis Tech
Vatis Tech

Romanian-headquartered transcription API with strong CEE language coverage.

Free tier + paid hours (see Vatis Tech pricing)
Wit.ai (Meta)
Meta

Meta's free natural-language and speech understanding platform.

free
Vonage Voice API (ASR Connector)
Vonage

Vonage's CPaaS speech-to-text via the ASR connector (typically Deepgram-powered).

Vonage Voice price + ASR per-minute
Plivo Voice (Speech Recognition)
Plivo

Plivo's CPaaS speech recognition for IVR + call-recording workflows.

Plivo Voice + per-minute ASR
Bandwidth Voice (Transcription)
Bandwidth

Bandwidth's voice CPaaS with optional transcription on recordings and IVR.

per-minute (see Bandwidth pricing)
Play.HT (STT endpoints)
Play.HT

Play.HT's transcription endpoint as a counterpart to its TTS family.

per-minute (see Play.HT pricing)
Lemonfox.ai
Lemonfox.ai

Hosted Whisper API at low per-hour pricing for developers.

from $0.17/hr (Whisper)
Speakbot (Whisper API alt)
Speakbot

Hosted Whisper API with file-based and URL ingestion.

per-minute (see Speakbot pricing)
Voicegain
Voicegain

Deep-learning ASR you can deploy in your own cloud or use as managed SaaS.

per-minute SaaS + Edge license
Amazon Transcribe Streaming
Amazon Web Services

Real-time streaming variant of Amazon Transcribe over HTTP/2 + WebSocket.

from $0.024/min
Azure Fast Transcription
Microsoft Azure

Azure Speech's batch-fast mode for short-turnaround transcription with predictable latency.

per-Azure-Speech pricing (Fast variant)
Google Cloud Speaker Diarization
Google Cloud

Diarization layer for Google Cloud Speech-to-Text v2.

per-Speech v2 pricing
Deepgram Nova-3
Deepgram

Deepgram's current-generation streaming + batch ASR model.

from $0.0043/min (Nova-3, batch)
AssemblyAI Realtime / Streaming
AssemblyAI

AssemblyAI's WebSocket streaming endpoint for live captions and agents.

from $0.15/hr (Streaming)
Gladia Realtime
Gladia

Gladia's real-time streaming ASR API with multilingual code-switching.

per-hour streaming (see Gladia pricing)
OpenAI /audio/transcriptions (whisper-1, gpt-4o-transcribe)
OpenAI

OpenAI's hosted Whisper + gpt-4o-transcribe models, batch endpoint.

from $0.006/min (whisper-1)
OpenAI /audio/translations
OpenAI

OpenAI's translate-to-English audio endpoint.

from $0.006/min (whisper-1)
SambaNova (Whisper endpoints)
SambaNova

SambaNova's hosted Whisper-large-v3 endpoint on its RDU accelerator.

see SambaNova pricing
Together AI (Whisper)
Together AI

Together AI's hosted Whisper models among its open-model catalog.

per-Together-AI pricing
DeepInfra (Whisper)
DeepInfra

DeepInfra's hosted Whisper endpoint with per-second GPU pricing.

per-Deepinfra pricing
OVHcloud AI Speech-to-Text
OVHcloud

OVHcloud's managed speech-to-text inside its sovereign EU cloud.

per-OVHcloud pricing
Scaleway AI Inference
Scaleway

Scaleway's GPU inference platform commonly used for hosted Whisper.

per-Scaleway pricing (compute-based)
Alibaba Tongyi Audio (Qwen-Audio)
Alibaba Cloud

Alibaba's Tongyi multimodal model exposed for transcription + audio understanding.

tiered RMB-per-token / per-second
Baidu ERNIE Speech
Baidu

Baidu's ERNIE-aligned speech models inside ERNIE Bot Cloud.

tiered RMB-per-call
Huawei Pangu Speech
Huawei Cloud

Huawei's Pangu foundation models extended to speech for enterprise scenarios.

tiered RMB-per-call
Tencent Hunyuan (audio modality)
Tencent Cloud

Tencent's Hunyuan multimodal model with audio understanding endpoints.

tiered RMB-per-token
Naver HyperCLOVA X (audio)
Naver Cloud Platform

Naver's HyperCLOVA X foundation model with audio understanding.

Naver Cloud HyperCLOVA pricing
Kakao Kanana Speech
Kakao Enterprise

Kakao's Kanana foundation-model family with audio understanding.

Contact sales
Rev.ai Streaming
Rev

Rev.ai's WebSocket streaming endpoint for live transcripts.

from $0.035/min (Streaming)
Speechmatics (batch / language packs)
Speechmatics

Speechmatics batch ASR with broad language pack catalog.

from $1.04/hr (Standard)
Hume EVI
Hume AI Inc.

Empathic voice interface with emotional-tone awareness.

pay-as-you-go
Otter.ai API
Otter.ai Inc.

Developer API access to Otter.ai's transcription engine.

contact sales
Rasa Pro
Rasa Technologies GmbH

Open-source-anchored conversational AI for enterprise.

free / contact sales
Google Cloud Dialogflow
Google Cloud

Google's conversational-AI platform for voice and chat agents.

usage-based
Amazon Lex
Amazon Web Services

AWS conversational-AI platform for voice and text bots.

usage-based
Microsoft Bot Framework
Microsoft Corporation

Microsoft's open-source SDK and platform for conversational bots.

free SDK + Azure costs
Rev VoiceHub
Rev

Rev's enterprise transcription and recording API platform.

paid
Trint API
Trint

Trint's transcription and translation API for newsrooms and media teams.

paid
iFlyTek Open Platform
iFlyTek

China's largest speech AI vendor — Mandarin, dialects, and 60+ languages via developer APIs.

tiered · free quota + pay-as-you-go in CNY
Tencent Cloud ASR
Tencent Cloud

Tencent's cloud speech-to-text with one-sentence, sentence, and real-time APIs.

tiered · per-second pricing in CNY
Alibaba DAMO ASR
Alibaba Cloud

Alibaba Cloud / DAMO Academy speech recognition with Paraformer non-autoregressive models.

tiered · per-hour pricing in CNY/USD
Volcengine Speech
ByteDance Volcano Engine

ByteDance's Volcano Engine speech-to-text — short, long, and streaming Mandarin ASR.

tiered · pay-as-you-go in CNY
Mobvoi Speech
Mobvoi

Mobvoi (Chumen Wenwen) speech APIs — Mandarin recognition behind TicWatch and Volkswagen voice.

enterprise · contact sales
NetEase Youdao ASR
NetEase Youdao

Youdao Cloud speech-to-text — Mandarin recognition behind Youdao Translator and dictionary pen.

tiered · per-character pricing in CNY
Sogou ASR
Sogou (Tencent)

Sogou (Tencent-owned) speech-to-text — input-method-grade Mandarin recognition.

enterprise · contact sales
Reverie Language Tech
Reverie Language Technologies

Reverie's Indic speech recognition — 11 Indian languages from one of Reliance Jio's group companies.

enterprise · contact sales
Bhashini
Government of India (Digital India Bhashini Division)

Government of India's national language platform — public ASR APIs for 22 official languages.

free for non-commercial; commercial tiers TBD
Sarvam AI
Sarvam AI

Sarvam AI — full-stack Indian foundation models including Saaras / Saaransh speech APIs.

tiered · per-minute pricing with free tier
Tinkoff VoiceKit
Tinkoff (T-Bank)

Tinkoff VoiceKit — Russian-language ASR + TTS used inside Tinkoff Bank's contact centre.

tiered · per-second pricing in RUB
SoundHound
SoundHound AI

SoundHound Houndify — multilingual voice AI platform with embedded and cloud ASR.

tiered · contact sales for enterprise pricing
Lelapa AI
Lelapa AI

Lelapa AI — South African startup building Vulavula speech and language tools for African languages.

tiered · pay-as-you-go with free tier
Intella
Intella

Intella — Arabic speech-to-text API focused on MSA and major Arabic dialects.

tiered · per-hour pricing in USD with free tier
Alvenir Danish ASR
Alvenir

Alvenir — Danish-language speech-to-text product from a Copenhagen startup.

tiered · pay-as-you-go in EUR/DKK
AI-Loop
AI-Loop

AI-Loop — multilingual African-language speech and NLP infrastructure.

tiered · pay-as-you-go in USD
Hume EVI
Hume AI

Empathic Voice Interface — voice AI that reads and responds to emotion in speech.

see vendor pricing
Dialogflow CX
Google Cloud

Google Cloud's enterprise conversational AI platform with voice and chat channels.

see vendor pricing
Microsoft Bot Framework (Voice)
Microsoft

Microsoft's bot orchestration SDK with voice channels via Direct Line Speech.

see vendor pricing
IBM watsonx Assistant (Voice)
IBM

IBM's enterprise conversational AI platform with voice and contact-center integrations.

see vendor pricing
Vertex AI Conversation
Google Cloud

Google Cloud's LLM-native conversational AI builder with voice support.

see vendor pricing
Twilio Voice Intelligence + Agents
Twilio

Twilio's ASR, voice intelligence, and ConversationRelay primitives for voice agents.

see vendor pricing
Plivo AI
Plivo

Voice AI agent capability layered on Plivo's CPaaS voice network.

see vendor pricing
Bandwidth Voice AI
Bandwidth

AI voice tooling layered on Bandwidth's tier-1 U.S. carrier network.

see vendor pricing
Telnyx Voice AI
Telnyx

AI inference and voice agents on Telnyx's own carrier and GPU stack.

see vendor pricing
Voximplant
Voximplant

CPaaS with serverless VoxEngine scenarios and AI voice integrations.

see vendor pricing
Daily.co Voice
Daily.co

WebRTC infrastructure for realtime voice and video AI agents.

see vendor pricing
Deepgram Voice Agent API
Deepgram

Single API for low-latency voice agents bundling Deepgram ASR + LLM + TTS.

see vendor pricing
AssemblyAI LeMUR Voice
AssemblyAI

AssemblyAI's LLM framework over its ASR for voice intelligence and agents.

see vendor pricing
ElevenLabs Conversational AI
ElevenLabs

ElevenLabs' end-to-end voice agent API with ASR, LLM, and premium TTS.

see vendor pricing
Azure AI Speech Voice Agent
Microsoft

Microsoft Azure's bundle of Speech SDK + Bot Framework for voice agents.

see vendor pricing
Ultravox Agents
Fixie.ai

Speech-native LLM and hosted agent runtime by Fixie.ai.

see vendor pricing
OpenAI Realtime Agents SDK
OpenAI

OpenAI's Agents SDK pattern over the Realtime API for voice-native assistants.

see vendor pricing
Anthropic Voice Agent Patterns
Anthropic

Reference patterns for building voice agents with Anthropic Claude models.

see vendor pricing
Cartesia Voice Agent stack
Cartesia

Cartesia's Sonic TTS plus partner ASR/LLM for low-latency voice agents.

see vendor pricing
Pollyo
Pollyo

AI-dubbing API for video platforms — backend OEM rather than a creator-facing app.

paid
Camb.ai TTS
Camb.ai

Camb.ai's standalone text-to-speech surface — same MARS model that powers their dubbing.

paid
iSpeech
iSpeech

TTS + STT API with consumer text-reader apps.

freemium — API per-request, consumer apps free
Desktop apps · 146 tracked
Native desktop applications for macOS, Windows, and Linux that transcribe files locally. All desktop apps →
MacWhisper
Jordi Bruin

Polished Mac app for Whisper — the default pick if you're on macOS.

freemium
SuperWhisper
Sindre Sorhus

Always-on system-wide dictation for macOS and iOS, powered by local Whisper.

freemium
Aiko
Sindre Sorhus

Free Mac App Store Whisper app — drag, drop, done.

free
Hindenburg PRO
Hindenburg Systems

Broadcast-style DAW built for journalists, podcasters, and audiobook producers.

paid
Filmora AI
Wondershare

Wondershare's consumer video editor with AI captioning and short-form features.

paid
Premiere Pro Auto-Transcribe
Adobe

Adobe Premiere Pro's built-in speech-to-text and caption track.

paid
Final Cut Pro Live Captions
Apple

Apple FCP's caption track with macOS dictation-based transcription assist.

paid
DaVinci Resolve Auto-Caption
Blackmagic Design

Resolve Studio's built-in speech-to-text caption track.

freemium
Recut
Recut

Native macOS app that removes silences from videos before you import to your editor.

paid
iZotope RX
iZotope

Industry-standard audio repair suite — dialogue isolation, declip, dehum, denoise.

paid
Adobe Audition AI
Adobe

Adobe Audition with built-in Enhance Speech, multitrack, and spectral repair.

paid
Audacity
Muse Group

Free open-source DAW with community plugins and a budding AI line.

free
Ocenaudio
Ocenaudio Team

Fast, simple cross-platform audio editor for clean-up work.

free
Hindenburg Smart Audio
Hindenburg Systems

Hindenburg's automatic leveling + Voice Profiler tuned for spoken word.

paid
Adobe Speech Enhance (Premiere)
Adobe

Adobe's Enhance Speech model exposed inside Premiere Pro.

paid
Subtitle Edit
Nikse

Free open-source subtitle editor with broad format support.

free
Aegisub
Aegisub Project

Open-source advanced subtitle editor favored by anime fansubbers.

free
Subtitle Workshop
URUSoft

Veteran Windows subtitle editor with batch conversion.

free
Stenograph CATalyst
Stenograph LLC

Court reporter CAT (computer-aided transcription) software for stenographers.

license + maintenance
Eclipse CAT (Advantage Software)
Advantage Software

Stenographic court-reporter CAT software — main competitor to Stenograph CATalyst.

license + maintenance
Case CATalyst
Stenograph LLC

Court reporter CAT software historically marketed as Case CATalyst by Stenograph.

license + maintenance
Sonocent Audio Notetaker (legacy)
Glean Education

Predecessor to Glean — audio note-taking software for students.

discontinued — see Glean
Express Scribe
NCH Software

Long-standing transcription playback software with foot pedal support.

free / Pro one-time license
f4transkript
audiotranskription.de

Academic transcription playback software popular in qualitative research.

license + subscription
MAXQDA Transcription
VERBI Software

Built-in transcription playback inside the MAXQDA qualitative analysis software.

license · academic discount
InqScribe
Inquirium LLC

Veteran transcription playback and subtitle authoring tool.

license $99
Transana
Wisconsin Center for Education Research

Qualitative analysis software for audio and video transcripts.

license · academic pricing
Wispr Flow
Wispr AI

AI-augmented dictation app — types into any text field on macOS and Windows.

free / from $12/mo
Nuance Dragon NaturallySpeaking
Microsoft / Nuance

Classic Windows desktop dictation for general users.

license from $200
Nuance Dragon Home
Microsoft / Nuance

Consumer-tier Dragon for personal Windows users.

license $200
Talon Voice
Talon Voice

Programmable voice control for power users — Linux, macOS, Windows.

free / Patreon paid
Braina
Brainasoft

Windows AI voice assistant and dictation product.

free / Pro license
e-Speaking
e-Speaking

Long-running Windows voice command and dictation utility.

license
SpeechPulse
SpeechPulse

Windows dictation utility built around Whisper.

license
Voice Finger
Voice Finger

Windows accessibility utility for mouse and keyboard control by voice.

free
Aqua Voice
Aqua Voice

AI dictation app — type into any field on macOS and Windows.

free / from $10/mo
Spokenly
Spokenly

macOS dictation app that uses Whisper locally.

$24.99 one-time
Voice Control for Mac (easy)
Various publishers

Front-end utilities making Apple Voice Control easier to configure.

free / paid plans
Telestream MacCaption
Telestream

Mac authoring tool for broadcast captions and post-production workflows.

perpetual · contact sales
Telestream CaptionMaker
Telestream

Windows authoring tool for closed captions and subtitles in broadcast and OTT.

perpetual · contact sales
EZTitles
ELF Software

Professional subtitle authoring suite used across European broadcast and OTT.

perpetual · contact sales
Spot Subtitle Editor
Spot Software

Classic Windows subtitle authoring tool with deep cinema and broadcast support.

perpetual · contact sales
FAB Subtitler
FAB Subtitling

Subtitle preparation and broadcast playout suite used across European TV.

perpetual · contact sales
Newfor (Screen Subtitling)
Screen Subtitling Systems

UK teletext-era subtitle origination still used in legacy broadcast workflows.

perpetual · contact sales
Screen Poliscript
Screen Subtitling Systems

Subtitle preparation tool from Screen Subtitling for broadcast and OTT delivery.

perpetual · contact sales
Cheetah CaptionMaker (legacy)
Cheetah International (legacy)

Original Cheetah Systems caption authoring tool — name now used by Telestream.

service · contact sales
Logosys
Logosys

Subtitle preparation suite from Italy's Logosys, used in European broadcast.

perpetual · contact sales
Caplav
Caplav

Subtitle preparation and live re-speaking tool aimed at European broadcasters.

perpetual · contact sales
FoCal Subtitler
FoCal

Niche subtitle authoring tool used by smaller European subtitle houses.

perpetual · contact sales
Stenograph CATalyst (Case CATalyst)
Stenograph

Industry-standard CAT software for stenographers and CART captioners.

perpetual · contact sales
StenoCAT
StenoCAT

Niche but durable CAT software for stenographers, with realtime captioning support.

perpetual · contact sales
Aiseesoft AI Translator
Aiseesoft

Desktop AI video translation suite for Windows + macOS — batch dub multiple files offline.

paid
Voicemod
Voicemod

Real-time voice changer + AI voice cloning — desktop, Windows-first, streamer audience.

freemium
Balabolka
Ilya Morozov

Free Windows TTS reader — uses installed SAPI voices, scriptable, no telemetry.

free
Pocketalk
Pocketalk Corp.

Pocket-sized dedicated live-translation hardware — 84 languages, two-way voice.

paid
Vasco Translator
Vasco Electronics

Handheld translator with lifetime free internet — 70+ languages, M3 + V4 models.

paid
Travis Touch
Travis

Crowdfunded handheld translator — 105 languages, Kickstarter origin.

paid
Boeleo W1
Boeleo

Pen-shaped scanning translator — point at printed text, get spoken translation.

paid
Lingmo Translate One
Lingmo International

Wearable / handheld live translation hardware from Lingmo International.

paid
Trados Studio (Multilingual)
RWS (Trados)

Enterprise CAT tool with subtitle + voice-over file-format support.

paid
memoQ (Multilingual)
memoQ

Enterprise CAT tool with subtitle file + AV reference support.

paid
Articulate Storyline 360
Articulate

Authoring tool for SCORM e-learning with closed-caption support per slide.

subscription
Adobe Captivate
Adobe

Adobe's e-learning authoring tool with closed-caption support for SCORM courses.

subscription
iSpring Suite
iSpring Solutions

PowerPoint-based e-learning authoring with closed-caption and narration tools.

subscription
Camtasia (Auto-Captions)
TechSmith

TechSmith Camtasia desktop video editor with auto-caption generation.

subscription
NVivo
Lumivero (formerly QSR International)

Qualitative research software with auto-transcription for interviews and focus groups.

subscription
ATLAS.ti
ATLAS.ti GmbH

Qualitative analysis platform with AI auto-coding and transcription.

subscription
MAXQDA
VERBI Software

Qualitative + mixed-methods research with built-in transcription tooling.

subscription
QDA Miner
Provalis Research

Qualitative + mixed-methods coding tool from Provalis Research.

subscription
Quirkos
Quirkos

Visual qualitative coding for thematic analysis, with transcript import.

subscription
Pro Tools Speech-to-Text
Avid

Avid Pro Tools' built-in dialogue transcription clip-effect for post and ADR sessions.

subscription $34.99/mo
Logic Pro Auto Mix
Apple

Apple Logic Pro's AI mixing assistant — leans on Stem Splitter + transcript-aware vocal balancing.

$199.99 one-time (macOS) · subscription $4.99/mo (iPad)
Ableton Live (AI plugin ecosystem)
Ableton

Ableton Live as a host for third-party AI vocal/speech plugins — Sonible, Acon, Synchro Arts, etc.

Standard $449 one-time · Suite $749 one-time
FL Studio (AI plugin ecosystem)
Image-Line

FL Studio as a VST host for AI speech-enhancement plugins (smart:EQ, Clarity Vx, DeNoise).

Producer $199 one-time · Signature $299 one-time · All Plugins Bundle $499 one-time · lifetime free updates
Cubase Pro VocalChain + speech-align
Steinberg

Steinberg Cubase Pro with VocalChain, AudioWarp, and built-in hooks for VocAlign / Revoice Pro.

Cubase Pro 14 $579 one-time · Artist $329 one-time · Elements $99 one-time
Steinberg Nuendo
Steinberg

Steinberg's post-production DAW with AI-assisted DialogueDetective and ADR workflow.

Nuendo 14 $1,799 one-time · NEK $279 one-time
Reaper (AI plugin host)
Cockos

Cockos Reaper — cheap, scriptable, hosts every AI dialogue plugin via VST3/CLAP/JSFX.

Discounted $60 one-time · Commercial $225 one-time
PreSonus Studio One
PreSonus

Studio One with Stem Separation, Lyrics Display, and Vocal/Speech-aware tooling.

Pro+ subscription $14.95/mo · Pro perpetual $399 one-time
Synchro Arts Revoice Pro 5
Synchro Arts

Standalone + ARA dialogue/vocal alignment with AI Process Assist.

Revoice Pro 5 $599 one-time · subscription $19.99/mo
Synchro Arts VocAlign Project 6
Synchro Arts

Lighter VocAlign aimed at music producers for unison/double-track tightening.

$99 one-time
Synchro Arts VocAlign Pro
Synchro Arts

VocAlign Pro — timing + pitch alignment, used for ADR and lip-sync dubbing.

$259 one-time
Sound Radix Auto-Align Post 2
Sound Radix

Phase + timing alignment across multi-mic dialogue recordings.

$349 one-time
iZotope Dialogue Match
iZotope

Match production-dialogue tone to ADR recordings — EQ + reverb + ambience capture.

$499 one-time (often bundled in RX Post Production Suite)
iZotope RX Dialogue Isolate (standalone)
iZotope

Source-separation module for stripping music + ambience away from dialogue.

bundled with RX Standard / Advanced / Post Production Suite
Acon Digital Extract:Dialogue
Acon Digital

AI-based dialogue extraction plugin for film, broadcast, and podcast post.

$199 one-time
Acon Digital DeNoise
Acon Digital

Spectral noise reduction with adaptive AI noise-print learning.

$99 one-time
Waves Clarity Vx
Waves Audio

Neural-network dialogue noise reduction — single-knob, AAX/VST3/AU.

$99 one-time (Clarity Vx) · $299 one-time (Clarity Vx Pro)
Waves Clarity Vx DeReverb
Waves Audio

AI de-reverb companion to Clarity Vx — removes room reflections from spoken dialogue.

$99 one-time
CEDAR Studio
CEDAR Audio

Broadcast-grade dialogue restoration suite — DNS, DeClick, DeHiss, DeBuzz.

per-module licensing · contact CEDAR for quote
CEDAR DNS 2 / DNS 8D / DNS One
CEDAR Audio

CEDAR's Dialogue Noise Suppression hardware + plugin line — production-set staple.

per-module quote · standalone hardware also available
NUGEN Audio Halo Upmix / DialogueChecker
NUGEN Audio

NUGEN's upmix + loudness suite with speech-aware dialogue level checking.

Halo Upmix $499 one-time · LM-Correct / VisLM $399+ one-time
Klevgrand Brusfri
Klevgrand

Single-fingerprint noise reducer tuned for dialogue and vocals.

$59.99 one-time (desktop) · $19.99 (iOS)
Sonible smart:EQ 4
Sonible

AI-driven adaptive EQ with a 'speech' profile for podcast and dialogue tracks.

$129 one-time · subscription $9.90/mo (Sonible Studio)
Sonible smart:dynamics
Sonible

Compressor with AI gain-profile detection — speech / vocals / drums presets.

$129 one-time
Sonible pure:comp / pure:limit / pure:verb
Sonible

AI-presetting compressor / limiter / reverb trio aimed at podcasters.

$39 one-time per plugin
Accentize DeBleeder / DialogueEnhance / VoiceGate
Accentize

ML-based dialogue plugins for spill removal, enhancement, and noise-aware gating.

DialogueEnhance $179 one-time · DeBleeder $179 one-time · VoiceGate $109 one-time
CrumplePop AudioDenoise AI / EchoRemover
CrumplePop (Boris FX)

FCP / Premiere / DaVinci-native AI noise + echo removal for video editors.

AudioDenoise AI $99 one-time · EchoRemover $99 one-time · Bundles available
Zynaptiq UNVEIL
Zynaptiq

Real-time de-reverb plugin built on Zynaptiq's MAP (Mixed-signal Audio Processing) tech.

$499 one-time
Zynaptiq UNFILTER
Zynaptiq

Real-time linear-filter compensator — fixes muffled mics, comb filtering, telephone EQ.

$369 one-time
Soundtoys Little AlterBoy
Soundtoys

Vocal pitch + formant shifter — used as a quick speech-disguise / character voicer.

$99 one-time · often bundled in Soundtoys 5
Audionamix XTRAX STEMS / Speech Volume
Audionamix

Source-separation suite for music + speech-volume normalization for podcasts.

varies by product · contact Audionamix for quote
Soundwhale
Soundwhale

Remote-collaboration DAW client for film/TV post — built-in AI dialogue tooling.

subscription $19.99/mo · Pro $39.99/mo
Avid Media Composer Phonetic Index
Avid

Media Composer's phonetic search + transcript-aware logging for NLE editors.

subscription $23.99/mo · annual $269.88/yr
Vegas Pro AI Audio Tools
MAGIX

MAGIX Vegas Pro with AI transcription, noise reduction, and smart-mask audio routing.

Edit subscription $19.99/mo · Pro perpetual $399 one-time
Lightworks AI Tools
LWKS

Lightworks NLE with AI-assisted transcription + caption workflow.

free tier · Create subscription $9.99/mo · Pro $23.99/mo
OBS Studio with AI Caption plugins
OBS Project + community plugins

OBS hosts third-party AI live-caption plugins (e.g. obs-localvocal) for streamers.

free
Streamlabs Desktop AI Tools
Streamlabs (Logitech)

Streamlabs (OBS fork) with AI Highlighter and stream-clip detection.

free · Ultra subscription $19/mo or $149/yr
NDI Tools + AI Captions
Vizrt (NDI)

NDI's free toolset with built-in caption + speech routing for production switchers.

free
Resolume Arena / Avenue (AI audio reactives)
Resolume

VJ software where AI speech-/audio-reactive plugins drive visuals from voice input.

Avenue 7 €299 one-time · Arena 7 €799 one-time
Apple GarageBand
Apple

Apple's free DAW — useful entry-point for podcasters before upgrading to Logic Pro.

free
Reason Studios (plugin host)
Reason Studios

Reason DAW as a VST3 / Rack Extension host for AI speech-cleanup plugins.

Reason 13 subscription $19.99/mo · perpetual $499 one-time
Bitwig Studio (AI plugin host)
Bitwig

Bitwig Studio with VST3/CLAP-hosted AI dialogue plugins — Sonible, Acon, Waves.

Bitwig Studio $399 one-time · Producer $199 · Essentials $99
Ardour (open-source DAW)
Ardour community

Cross-platform open-source DAW that hosts AI VST3/LV2 dialogue plugins.

free (build from source) · $1+/mo or $45+ one-time for binary downloads
Harrison Mixbus 32C
Harrison Audio

Console-style DAW (Harrison) with classic-mic preamp emulations + dialogue workflow.

Mixbus 32C $239 one-time
MAGIX Samplitude Pro X
MAGIX

MAGIX's pro DAW with object-oriented editing + speech-aware mastering chain.

$399 one-time
Steinberg WaveLab Pro
Steinberg

Mastering DAW with podcast-oriented loudness + speech-aware analysis tools.

WaveLab Pro 12 $579 one-time · Elements $99
TwistedWave
TwistedWave

macOS / iOS / online audio editor used widely in podcast post — clean speech-edit UX.

macOS $79.90 one-time · iOS $9.99 one-time · Online subscription tiers
HairerSoft Amadeus Pro
HairerSoft

macOS-native multi-track editor with spectral repair and AU plugin hosting.

$59.99 one-time
iZotope RX Elements
iZotope

RX Elements — the entry-tier of the RX family, often bundled with audio interfaces.

$129 one-time (often included with audio interface bundles)
Audio Design Desk
Audio Design Desk

AI-assisted post-production editor that auto-syncs SFX + dialogue to picture.

Creator $4.99/mo · Pro $19.99/mo · Studio $39.99/mo
Klang:fabrik / Klang:apps
Klang:technologies

Immersive object-based mixing platform — speech routing in 360 audio.

enterprise · contact Klang for quote
Supertone Clear
Supertone

AI noise/reverb removal plugin from Supertone — broadcast-grade real-time dialogue cleanup.

subscription $9.99/mo · perpetual $179 one-time
Neutone FX / Morpho
Neutone (Qosmo)

Neural-audio research plugin platform — host AI speech / instrument models in any DAW.

Neutone FX free · Morpho subscription tiers
Vienna MIR Pro / MIR x
Vienna Symphonic Library

Convolution / room-simulation suite used for dialogue placement in immersive mixes.

MIR Pro 24 €395 one-time · MIR x bundled with selected libraries
FabFilter Pro-Q 4 (with AI Spectrum Match)
FabFilter

FabFilter's flagship EQ — Pro-Q 4 adds an AI Spectrum Match for dialogue / vocal matching.

$169 one-time (Pro-Q 4)
ElevenLabs Voice Changer (plugin path)
ElevenLabs

ElevenLabs' speech-to-speech voice conversion — wrapped as a desktop / plugin workflow.

free tier · Creator $22/mo · Pro $99/mo
Voicemod (real-time voice changer)
Voicemod

Real-time AI voice-changer routed into Discord, OBS, Zoom — system-level audio plugin.

free · Pro subscription $5+/mo (intro) up to $15/mo
Topaz Audio AI (planned / beta)
Topaz Labs

Topaz Labs' speech-enhancement engine — currently in invite beta.

beta · pricing tbd by Topaz Labs
iZotope Ozone 11 (with Master Assistant)
iZotope

iZotope Ozone — AI Master Assistant with explicit speech / podcast targets.

Standard $249 one-time · Advanced $499 one-time · subscription tiers
iZotope Nectar 4 (vocal-aware chain)
iZotope

iZotope's vocal channel strip with Vocal Assistant — speech and vocal modes.

Standard $249 one-time · Advanced $399 one-time
Voice.ai
Voice.ai

Real-time AI voice changer with universal app compatibility.

free + Pro paid tier
MagicMic
iMyFone

iMyFone's real-time voice changer for Windows/macOS with 600+ effects.

Monthly $9.95, Yearly $19.95, Lifetime $39.95
Clownfish Voice Changer
Clownfish

Free Windows voice changer with system-wide audio interception.

free
AV Voice Changer Software
Audio4fun

Audio4fun's commercial voice changer suite for Windows.

Basic $29.95, Gold $59.95, Diamond $99.95 (one-time)
MorphVOX
Screaming Bee

Screaming Bee's veteran voice changer for gaming and online play.

Junior free, Pro $39.99 one-time
Panopreter
Panopreter Software

Windows TTS reader with batch file conversion.

Basic free, Plus $29.95 (one-time)
Apple Siri
Apple

Apple's voice assistant across iPhone, iPad, Mac, Apple Watch, and HomePod.

Free with Apple device
Apple Intelligence Siri (Siri 2.0)
Apple

The LLM-augmented Siri introduced with Apple Intelligence on iOS 18 / macOS Sequoia.

Free with supported Apple device
Microsoft Cortana
Microsoft

Microsoft's voice assistant — consumer surfaces deprecated; remnants live on in Teams.

Deprecated · use Copilot instead
Apple Vision Pro Live Captions
Apple

Real-time captions in visionOS — overlay spoken speech as floating text in your field of view.

Free with Vision Pro ($3,499 headset)
Android Live Caption
Google

System-wide on-device captioning of any audio on Android — calls, video, podcasts.

Free with supported Android device
AirPods Pro Live Listen
Apple

Use AirPods Pro as remote mic + real-time amplifier — hearing-assist mode.

Free with AirPods Pro ($249)
AirPods Pro Hearing Aid
Apple

FDA-cleared hearing-aid mode in AirPods Pro 2 — clinical-grade hearing assistance.

Free with AirPods Pro 2 ($249)
Apple HomePod Voice (Siri on HomePod)
Apple

Siri on HomePod / HomePod mini — voice-first smart-home control.

$99 mini · $299 full HomePod
Apple TV Siri Remote Voice Search
Apple

Hold the Siri button on the Apple TV remote to search content by voice.

Included with Apple TV ($129+)
Apple Vision Pro Dictation
Apple

On-device dictation in visionOS for any text-input UI element.

Free with Vision Pro ($3,499)
Products · 981 tracked
End-user transcription products — meeting bots, editors, and turnkey workflows. All products →
Otter.ai
Otter.ai

Meeting-bot transcription product for Zoom/Meet/Teams.

free / from $10/mo
Rev
Rev

Human + AI transcription, highest accuracy tier on the market.

AI from $0.25/min · human from $1.50/min
Descript
Descript

Audio/video editor that treats the transcript as the timeline — different product category.

free / from $12/mo
Trint
Trint

Enterprise-focused transcription + collaborative editor for newsrooms.

from $60/mo
Fireflies.ai
Fireflies.ai

Meeting-bot transcription + CRM integrations, competitor to Otter.

free / from $10/mo
Amazon Transcribe Call Analytics
Amazon Web Services

Post-call and real-time call analytics on AWS with sentiment, talk-time, and issue detection.

from $0.03/min (post-call) + add-ons
Symbl.ai
Symbl.ai

Conversation intelligence API with transcripts, action items, and live agent assist.

per-minute (free tier + paid tiers)
Voci Technologies (V-Spark / V-Blaze)
Medallia (Voci)

Enterprise call-analytics ASR engine, on-prem-friendly, acquired by Medallia.

Contact sales
VoiceBase (LivePerson)
LivePerson (VoiceBase)

Conversation analytics API acquired by LivePerson, originally a transcription-first vendor.

Contact sales
Behavox
Behavox

Compliance-focused voice + comms surveillance for regulated finance.

Contact sales
Cogito
Cogito

Real-time agent assist analyzing voice paralinguistics for emotion + behavior coaching.

Contact sales
Hume EVI (Empathic Voice + Prosody)
Hume AI

Hume's voice-AI platform with prosody/expression analysis bundled with transcription.

per-minute (see Hume pricing)
Picovoice Cheetah
Picovoice

On-device streaming speech-to-text optimized for embedded and edge.

Free tier + per-user/Enterprise plans
Picovoice Leopard
Picovoice

On-device offline speech-to-text from Picovoice — file-based, no cloud.

Free tier + per-user/Enterprise plans
Cobalt Speech (Cubic / Diatheke)
Cobalt Speech & Language

On-prem ASR + dialog stack popular in defense, regulated finance, and accessibility.

Contact sales
Krisp API
Krisp

Krisp's noise-cancellation + transcription + AI meeting assistant API for embedding.

Contact sales
Twilio Voice Intelligence
Twilio

Twilio's call-analytics product on top of Twilio Voice — transcripts + language operators.

from $0.05/min (transcription) + per-operator add-ons
Nylas Notetaker
Nylas

API to drop a notetaker bot into Zoom/Meet/Teams meetings and return transcripts.

Contact sales
Recall.ai
Recall.ai

Universal meeting bot API for Zoom/Meet/Teams/Webex with transcription + raw audio.

per-bot-minute + transcription minute (see Recall.ai pricing)
RingCentral RingSense
RingCentral

RingCentral's call-recording and conversation-intelligence layer.

per-seat add-on to RingCentral
Dialpad Ai (DialpadGPT)
Dialpad

Dialpad's in-house conversation intelligence on top of its UCaaS + contact-center.

Bundled in Dialpad tiers
Zoom AI Companion
Zoom

Zoom's bundled meeting AI: transcripts, summaries, smart compose.

Included with paid Zoom Workplace plans
Webex AI Assistant
Cisco Webex

Cisco Webex's meeting AI: transcripts, summaries, action items.

Bundled in Webex Suite tiers
Microsoft Teams Premium (Intelligent Recap)
Microsoft

Teams Premium's AI meeting features: intelligent recap, live captions, translations.

$10/user/month (Teams Premium add-on, list)
Google Meet AI (Gemini in Meet)
Google

Gemini-powered take-notes-for-me, captions, and translation inside Google Meet.

Included in Gemini for Workspace tiers
Fireflies.ai API
Fireflies.ai

Fireflies' notetaker exposed as an API for meetings + custom audio uploads.

Bundled in Fireflies plans (Business / Enterprise)
Grain
Grain

Sales-focused meeting recorder with transcripts, clips, and coaching.

Free + paid plans (Starter / Business)
Fathom
Fathom

Free AI meeting notetaker with transcripts, summaries, and CRM sync.

Free for individuals; paid Team tiers
tl;dv
tl;dv

AI meeting notetaker focused on Zoom + Meet with multilingual transcripts.

Free + paid Pro / Business plans
Supernormal
Supernormal

AI notetaker for Google Meet, Zoom, and Teams with customizable templates.

Free + paid Pro / Business / Enterprise
Avoma
Avoma

Meeting lifecycle + conversation intelligence platform for revenue teams.

Free + paid Starter / Plus / Business / Enterprise
Fellow
Fellow

Meeting management + AI notetaker with agendas, action items, and 1:1 templates.

Free + paid Pro / Business / Enterprise
Circleback
Circleback

AI notetaker focused on premium meeting summaries and CRM workflows.

paid tiers (see Circleback pricing)
Spinach.ai
Spinach.ai

AI scrum master + meeting notetaker for engineering teams.

Free + paid plans
Otter Business / Enterprise
Otter.ai

Otter.ai's team and enterprise tiers with OtterPilot for live meetings.

$20/user/mo (Business) and Enterprise (contact sales)
Rev Max / Enterprise
Rev

Rev's enterprise plans bundling ASR, transcripts and Verbit-class managed services.

from $34.99/user/mo (Max) and Enterprise (contact sales)
Verbit
Verbit

Enterprise transcription + captioning + ASR for legal, media, education, government.

Contact sales
Scribie
Scribie

Hybrid AI + human transcription with file-based ordering.

from $0.10/min (machine) to $0.80/min (manual)
GoTranscript
GoTranscript

Human transcription service with 119+ language pairs and multiple turnaround tiers.

from $0.84/min (human, standard tier)
TranscribeMe
TranscribeMe

Hybrid AI + human transcription with strong privacy and security positioning.

from $0.79/min (human verbatim) and lower for AI
SpeakWrite
SpeakWrite

US-based human transcription with same-day turnaround for legal and government.

per-word pricing (see speakwrite.com)
Happy Scribe
Happy Scribe

Dublin-based AI + human transcription, subtitles, and dubbing platform.

from $17/user/mo (Basic) and per-hour overage
Sonix
Sonix

AI transcription + collaborative editor with multilingual support.

from $10/hr (Standard) + per-hour or subscription
Temi
Rev (Temi)

Rev's machine-only transcription product at a flat per-minute rate.

$0.25/min (machine-only)
Amberscript
Amberscript

Dutch AI + human transcription and subtitling platform with EU data residency.

from EUR 8/hr (machine) up to manual + certified tiers
Rev.com Podcast Transcription
Rev

Rev.com's per-minute podcast transcription with English + Spanish coverage.

$1.99/min (human) and $0.25/min (AI)
Descript Storyboard
Descript

Descript's video + audio editor with ASR-driven transcript editing.

from $24/user/mo (Hobbyist) up to Pro / Business
Trint Stories
Trint

Trint's newsroom-focused transcript editor with multi-language workflows.

Bundled in Trint subscription tiers
Observe.AI
Observe.AI

Contact-center AI for agent assist, automated QA, and conversational intelligence.

Contact sales
CallMiner Eureka
CallMiner

Long-running conversation-analytics suite for contact-center operations and compliance.

Contact sales
Verint Speech Analytics
Verint

Verint's voice-of-customer + workforce-engagement analytics on calls.

Contact sales
NICE Enlighten AI
NICE

NICE's AI layer for CXone with proprietary contact-center speech models.

Contact sales
Level AI
Level AI

Contact-center AI for auto-QA, agent assist and analytics on conversational signal.

Contact sales
Talkdesk Copilot
Talkdesk

Talkdesk's AI layer for agent assist, automated summaries, and QA inside its CCaaS.

Add-on to Talkdesk CCaaS plans
Genesys Cloud AI Experience
Genesys

Genesys Cloud's bundled AI tier with voice transcripts, summaries, and agent copilot.

Bundled in Genesys Cloud AI Experience tiers
Five9 AI (Aceyus + Inference)
Five9

Five9's AI capabilities for contact-center voice and digital channels.

Add-on to Five9 CCaaS plans
Convoso
Convoso

Outbound contact-center platform with conversation analytics for sales calls.

Contact sales
Aircall AI
Aircall

Aircall's AI layer for transcription, summaries, and call insights.

Add-on to Aircall plans
Cresta
Cresta

Real-time agent assist and contact-center AI built on proprietary LLMs.

Contact sales
Balto
Balto

Real-time playbook + coaching engine for contact-center agents.

Contact sales
Tethr
Tethr

Conversation analytics platform focused on customer-effort and CX measurement.

Contact sales
Invoca
Invoca

Revenue-execution platform analyzing inbound calls for marketing attribution + conversion.

Contact sales
Marchex
Marchex

Call analytics + conversation AI for marketing attribution.

Contact sales
CallRail Conversation Intelligence
CallRail

CallRail's transcription + conversation analytics layer for inbound calls.

from $50/mo (Conversation Intelligence add-on, plus base CallRail plan)
Gridspace
Gridspace

Voice AI infrastructure with proprietary ASR + voice bots for enterprise contact centers.

Contact sales
Phonic
Phonic

Voice and video research platform using ASR + NLP for qualitative analysis.

Contact sales
Userbird Voice
Userbird

Voice + video feedback widget with auto-transcription for product teams.

Free + paid tiers
Phonely
Phonely

AI voice agent platform for inbound and outbound phone calls.

per-minute (see Phonely pricing)
Bland.ai
Bland.ai

AI voice-agent platform for outbound + inbound calls at scale.

from $0.09/min
Vapi
Vapi

Voice-AI infrastructure: turnkey assistants composed of STT + LLM + TTS providers.

$0.05/min (Vapi orchestration) + provider pass-through
Retell AI
Retell AI

Real-time voice-agent platform with low-latency conversational AI.

per-minute (see Retell pricing)
Synthflow
Synthflow

No-code voice-agent builder for SMB phone automation.

from $29/mo (Starter)
Voiceflow
Voiceflow

Conversation-design platform for voice and chat agents.

Free + paid plans (Pro / Teams / Enterprise)
Regal AI
Regal

AI phone agents and contact-center for outbound sales.

Contact sales
Speechly (Roblox)
Roblox (Speechly)

Real-time speech-to-text + intent platform; acquired by Roblox in 2023.

Status: acquired — verify availability
Speak Ai
Speak Ai

Qualitative-research transcription + NLP for media monitoring and research.

from $19/mo (Starter)
Notta
Notta

Multilingual transcription app with strong Asia-Pacific footprint.

Free + paid Pro / Business plans
Yandex Cloud SpeechSense
Yandex Cloud

Yandex Cloud's call-center analytics product layered on SpeechKit.

tiered RUB-per-second + analytics fee
interScriber
interScriber

Swiss interview transcription tool for journalists and researchers.

per-minute (see interScriber pricing)
Noota
Noota

Meeting recorder for recruitment and sales with structured-output templates.

Free + paid Starter / Pro / Business
Vatic AI
Vatic AI

Voice intelligence platform for transcripts, summaries and topic analysis.

Contact sales
Spitch
Spitch

Swiss voice-AI platform with strong Swiss-German + multilingual coverage.

Contact sales
Intelligent Voice
Intelligent Voice

Enterprise voice analytics + eDiscovery on call recordings.

Contact sales
SmartAction
SmartAction

Voice-AI IVR replacement for contact centers.

Contact sales
Interactions IVA
Interactions

Enterprise virtual assistant for contact centers with human-in-the-loop fallback.

Contact sales
Yactraq
Yactraq

Cloud-based ASR + speech-analytics service for contact centers.

Contact sales
VoiceIntelligence.AI
VoiceIntelligence.AI

Voice-analytics platform spanning recording, transcription, scoring and BI.

Contact sales
AudioShake
AudioShake

AI stem separation + transcription for media, music, and podcast post-production.

Contact sales
Wordcab
Wordcab

Conversation-AI API focused on long-form transcription + summarization.

Contact sales
Azure Conversation Transcription
Microsoft Azure

Azure Speech's multi-speaker meeting transcription with channel and speaker ID.

per-Azure-Speech pricing
Speechmatics Flow
Speechmatics

Speechmatics' voice agent API combining ASR, LLM, and TTS.

Contact sales
IBM watsonx Assistant (speech)
IBM

IBM watsonx Assistant's speech-in for chat/voice agents.

per IBM watsonx pricing
Twilio Media Streams + Partner ASR
Twilio

Twilio's media-streaming primitive piped to partner ASR vendors.

Twilio Voice + chosen ASR vendor
Deepgram Nova-3 Medical
Deepgram

Deepgram's medical-tuned variant of Nova-3 for healthcare ASR.

per Deepgram pricing (medical SKU)
Speechmatics (medical / regulated)
Speechmatics

Speechmatics' enterprise ASR with on-prem option for healthcare.

Contact sales
Abridge
Abridge

Clinical documentation: ambient transcription + structured medical notes.

Contact sales (enterprise)
Nuance DAX (Microsoft Dragon Copilot)
Microsoft (Nuance)

Ambient clinical documentation, now Microsoft Dragon Copilot.

Contact Microsoft Healthcare sales
Suki AI
Suki AI

Voice-enabled AI assistant for clinical documentation.

Contact sales
Augmedix
Augmedix

Hybrid human + AI medical scribe with real-time documentation.

Contact sales
DeepScribe
DeepScribe

AI medical scribe converting clinical conversations into structured notes.

Contact sales
Ambience Healthcare
Ambience Healthcare

AI documentation platform for clinicians with multi-specialty coverage.

Contact sales
Heidi Health
Heidi Health

AI scribe popular outside the US for solo and small-practice clinicians.

Free + paid Pro plans
Freed
Freed

Lightweight AI scribe targeting independent US clinicians.

from $99/clinician/mo
Talkdesk QM Assist
Talkdesk

Automated quality-management scoring + coaching from Talkdesk.

Add-on to Talkdesk CCaaS plans
Five9 Agent Assist
Five9

Five9's real-time agent-assist module for sentiment, next-best-action, and summary.

Add-on to Five9 CCaaS plans
Genesys Cloud Voice Bot Flows
Genesys

Genesys Cloud's built-in voice-bot builder for IVR replacement and self-service.

Bundled in Genesys Cloud tiers
Vonage AI Studio
Vonage

Vonage's no-code IVA builder for voice and chat agents.

Vonage AI Studio pricing
TranscribeCx (regional)
Various regional vendors

Smaller regional transcription vendors aggregated under a directory-level placeholder.

Varies by vendor
Tomedes Transcription
Tomedes

Tomedes translation agency's human transcription division.

per-minute (see Tomedes quote)
Lexikeet (regional human transcription)
Lexikeet

Regional human-transcription marketplaces for niche language pairs.

per-minute, quote-based
Podsqueeze
Podsqueeze

Podcast workflow tool with transcription, show notes, and clips.

from $9/mo (Starter)
Swell AI
Swell AI

Podcast content engine with transcription + summaries + repurposing.

from $29/mo
Deciphr
Deciphr

AI-powered podcast and video producer for transcripts + clips + content.

from $20/mo
Captions
Captions

AI video editor with auto-captions and dubbing for short-form creators.

Free + paid Pro
Submagic
Submagic

AI captions and short-clip editor for short-form creators.

from $20/mo
Opus Clip
Opus Clip

AI clipper that turns long videos into short-form clips with captions.

from $19/mo (Starter)
Vizard
Vizard

Online AI video editor with auto-captions and clip generation.

Free + paid plans
Kapwing AI Subtitles
Kapwing

Kapwing's online editor with AI auto-subtitles for video creators.

Free + paid Pro
VEED.IO
VEED.IO

Online video editor with AI subtitles, translation, and dubbing.

Free + paid Basic / Pro / Business
Rev Captions
Rev

Rev's captioning SKU for video accessibility and broadcast.

from $1.50/min (human captions) and $0.25/min (AI)
Capshion
Capshion

AI captioning service for short-form video creators.

from $10/mo
SubtitleBee
SubtitleBee

Web tool generating styled subtitles for video creators.

from $11.99/mo
YouTube Studio Auto-Captions
YouTube

YouTube's built-in automatic captions for uploaded videos.

free
TikTok Auto-Captions
TikTok

TikTok's in-app automatic captions for uploaded short-form videos.

free
Instagram Auto-Captions
Meta

Instagram's in-app auto-captions for Reels and Stories.

free
Facebook Auto-Captions
Meta

Meta's Page-level auto-caption tool for uploaded video.

free
LinkedIn Auto-Captions
Microsoft (LinkedIn)

LinkedIn's native auto-captions for uploaded videos.

free
StageTen Captions
StageTen

Live-streaming production tool with built-in auto-captions.

Bundled in StageTen plans
AI-Media (LEXI)
AI-Media

Broadcast-grade live captioning for TV, government, and education.

Contact sales
Epiphan LiveScrypt
Epiphan Video

Live captioning appliance for events and broadcast.

Hardware purchase + service
Interprefy
Interprefy

Remote simultaneous interpretation platform with AI captions and human interpreters.

Contact sales
KUDO
KUDO

Multilingual virtual meeting platform with human + AI interpretation.

Contact sales
Wordly
Wordly

AI live translation and captioning for meetings and events.

Contact sales
Tactiq
Tactiq Pty Ltd.

Live in-browser transcription overlay for Google Meet, Zoom, and Teams.

free / from $12/user/mo
Granola
Granola Inc.

AI notepad that augments your own notes with meeting-context transcripts.

free trial / from $18/mo
Read.ai
Read AI Inc.

Meeting copilot with engagement analytics and AI-generated recaps.

free / from $19.75/mo
Sembly AI
Sembly AI Inc.

Multilingual AI meeting assistant with searchable team knowledge base.

free / from $15/mo
Spinach
Spinach.io Inc.

AI scrum-master that runs standups, retros, and engineering rituals.

free / from $9.99/user/mo
MeetGeek
MeetGeek.ai

AI meeting assistant with library, analytics, and conversation insights.

free / from $19/user/mo
Krisp Notes
Krisp Technologies Inc.

AI meeting notes added on top of Krisp's noise-cancellation app.

free / from $8/mo
Limitless
Limitless (formerly Rewind)

Wearable AI pendant that captures and transcribes in-person conversations.

hardware $99 + from $19/mo
Plaud Note
Plaud AI

AI voice recorder card that pairs with ChatGPT for transcription and summarization.

hardware $159 + free or $79/yr
Rewind
Rewind AI Inc.

macOS app that records and indexes everything seen and heard on a Mac.

from $19/mo
Briefly
Briefly Inc.

Quick AI-generated meeting summaries for Zoom, Meet, and Teams.

free / from $15/mo
Loopin
Loopin AI

AI notetaker focused on calendar-based summaries and team standups.

free / from $12/user/mo
Magical Notes
Magical (formerly Text Blaze).

AI templates and meeting recaps inside the Magical productivity extension.

free / from $12/user/mo
Equal Time
Equal Time Inc.

AI meeting copilot that measures speaking time and inclusion metrics.

free / from $20/user/mo
Cogram
Cogram Ltd.

Enterprise AI notetaker with on-premise deployment and strong compliance posture.

contact sales
Notiv
Notiv Pty Ltd.

Sales-call analysis and notetaker with focus on Pacific and APAC teams.

from $12/user/mo
Gong
Gong.io Inc.

Revenue intelligence platform that records, transcribes, and analyzes sales calls.

contact sales
Chorus.ai
ZoomInfo Technologies

Conversation intelligence platform inside ZoomInfo's revenue OS.

contact sales
Clari Copilot
Clari Inc.

Conversation intelligence inside the Clari revenue platform (formerly Wingman).

contact sales
Salesloft Conversations
Salesloft Inc.

Call recording and intelligence inside the Salesloft revenue workflow platform.

contact sales
Outreach Kaia
Outreach Corporation

Real-time sales assistant and conversation intelligence in Outreach.

contact sales
Apollo Conversations
Apollo.io Inc.

Call recording and AI insights inside Apollo's go-to-market platform.

from $99/user/mo
Trellus
Trellus AI

Real-time conversation coach for cold-call dialers.

free / from $39.99/user/mo
Rilla
Rilla Voice Inc.

Virtual ride-along sales coach for in-home and field sales reps.

contact sales
Refract
Refract.ai (Allego)

Conversation analytics and coaching for sales and customer-success teams.

contact sales
ExecVision
Mediafly Inc.

Conversation intelligence and coaching platform inside Mediafly.

contact sales
Allego
Allego Inc.

Sales enablement platform with conversation intelligence and onboarding video.

contact sales
Mindtickle
Mindtickle Inc.

Sales readiness platform with call AI and rep skill analytics.

contact sales
SetSail
SetSail Technologies Inc.

Revenue data platform that scores sales rep activity and signal capture.

contact sales
Modjo
Modjo SAS

European conversation intelligence platform with deep CRM integration.

contact sales
Salesken
Salesken Inc.

Real-time AI sales coaching during live calls.

contact sales
Revenue Grid
Revenue Grid Inc.

Revenue intelligence with conversational AI and Salesforce inbox sidebar.

contact sales
Dialpad Ai Voice
Dialpad Inc.

Cloud phone system with real-time transcription and AI call summaries.

from $15/user/mo
HubSpot Conversation Intelligence
HubSpot Inc.

Native call recording and AI insights inside HubSpot Sales Hub.

bundled from $90/user/mo
Einstein Conversation Insights
Salesforce Inc.

Salesforce's native AI for sales-call summarization and analysis.

contact sales
Pipedrive AI
Pipedrive OÜ

AI sales assistant and CRM summarization inside Pipedrive.

from $59/user/mo
Zoho Zia
Zoho Corporation

Zoho's AI assistant with voice transcription across CRM, mail, and meetings.

from $45/user/mo
ClickUp Brain
ClickUp Inc.

AI summarization and assistant inside the ClickUp project-management suite.

$7/user/mo add-on
Notion AI Meeting Notes
Notion Labs Inc.

Notion's built-in AI summaries for transcribed meetings and notes.

$10/user/mo add-on
Mem
Mem Labs Inc.

Self-organizing AI notebook with meeting transcription and recall.

free / from $14.99/mo
Reflect Notes
Reflect Notes Inc.

Encrypted personal notes app with built-in OpenAI-powered AI features.

$10/mo or $100/yr
Tana
Tana Inc.

Outliner with AI nodes and meeting-capture workflows.

free / from $14/mo
Coda AI
Coda Project Inc.

AI assistant inside the Coda doc-and-database platform.

from $12/mo
Bardeen
Bardeen AI Inc.

Browser-based AI automation that includes meeting-bot summary playbooks.

free / from $20/user/mo
Zoom AI Companion
Zoom Communications Inc.

Native AI assistant across Zoom meetings, chat, and phone.

bundled with paid Zoom
Slack AI
Salesforce / Slack Technologies

Channel and thread summarization plus AI search across Slack workspaces.

$10/user/mo add-on
Loom AI
Atlassian / Loom Inc.

AI titles, summaries, and chapters for async video messages.

from $15/user/mo
Microsoft 365 Copilot in Teams
Microsoft Corporation

Microsoft Copilot inside Teams for live meeting summaries and recaps.

$30/user/mo
Gemini for Google Meet
Google LLC

Google Gemini for live captions, summaries, and Take Notes for Me in Meet.

bundled with Workspace
Slite AI
Slite SAS

Slite's AI assistant for asynchronous docs and meeting notes.

from $8/user/mo
Zoom Events AI Recap
Zoom Communications Inc.

AI recap and on-demand replay for Zoom Events and Webinars.

from $79/license/mo
ON24 Intelligence
ON24 Inc.

AI-driven webinar engagement and content intelligence.

contact sales
BigMarker
BigMarker.com LLC

Webinar and virtual-event platform with AI session highlights.

from $79/mo
Goldcast
Goldcast Inc.

B2B event platform with Content Lab for AI clip generation.

contact sales
Welcome
Welcome Inc.

Webinar and event platform with AI-generated recaps and clip library.

contact sales
StreamYard
StreamYard (Hopin Inc.)

Browser-based live-streaming studio owned by Hopin.

free / from $20/mo
Restream
Restream Inc.

Multi-destination live-streaming platform with AI recording features.

free / from $16/mo
Demio
Banzai International Inc.

Webinar platform with built-in transcription and AI recap.

from $59/mo
Wistia AI
Wistia Inc.

AI captions, chapters, and SEO for marketing-video hosting.

from $24/mo
Lattice AI
Lattice Inc.

AI summaries and feedback inside the Lattice performance-management platform.

contact sales
15Five
15Five Inc.

Performance management with AI-generated coaching insights.

contact sales
Reclaim.ai
Reclaim Inc.

AI calendar that protects time for habits, meetings, and focus blocks.

free / from $10/mo
Colibri.ai
Colibri.ai

Real-time AI sales coach with conversation intelligence and CRM sync.

free / from $16/user/mo
Scribbl
Scribbl

Free Chrome-extension AI notetaker for Google Meet.

free / from $15/user/mo
Airgram
Airgram Inc.

Meeting notetaker with timeline scribbling and AI summaries.

free / from $18/user/mo
Noty.ai
Noty.ai

AI meeting notes for Google Meet and Zoom with live captioning.

free / from $14.99/user/mo
Vowel
Vowel Inc.

Meeting-platform-and-notetaker combo with built-in video calling.

free / from $20/user/mo
Grain
Grain Inc.

Conversation intelligence and clip-sharing for revenue teams.

free / from $19/user/mo
Jamy
Jamy AI

Multilingual AI meeting notetaker with action-item automation.

free / from $19/user/mo
Laxis
Laxis Inc.

Meeting notetaker with research-interview templates and CRM workflows.

free / from $15.99/user/mo
Clearword
Clearword

Real-time AI meeting assistant with searchable highlights library.

free / from $25/user/mo
Fellow
Fellow.app Inc.

Meeting management platform with agendas, notes, and AI summaries.

free / from $11/user/mo
Hypercontext
Hypercontext Inc.

1:1 and team-meeting platform with AI-suggested talking points.

free / from $7/user/mo
Hive Notes
Hive Technology Inc.

Meeting notes inside Hive's project-management platform.

free / from $1/user/mo
Docket
Docket Inc.

AI meeting prep, agenda, and recap for sales and customer-success teams.

contact sales
Attention
Attention Inc.

Real-time AI sales coach and CRM autopilot.

contact sales
tldv Notes (formerly Lavender Notes)
Lavender.ai

AI meeting recap layer plus email outbound assistant.

from $29/user/mo
Sybill
Sybill Inc.

AI sales coach that scores reps on emotional and behavioral cues.

from $59/user/mo
Clari Groove
Clari Inc.

Sales engagement layer (formerly Groove) inside Clari's revenue platform.

contact sales
NICE CXone Analytics
NICE Ltd.

Enterprise contact-center suite with conversation analytics.

contact sales
Five9 Genius
Five9 Inc.

AI and conversation analytics inside the Five9 contact-center platform.

contact sales
Dialpad Ai Contact Center
Dialpad Inc.

Cloud contact center with built-in real-time transcription and AI scorecards.

from $80/agent/mo
JustCall IQ
JustCall.io (SaaS Labs)

Conversation intelligence inside the JustCall cloud-phone platform.

from $19/user/mo
Zoom Revenue Accelerator
Zoom Communications Inc.

Conversation intelligence inside Zoom for revenue teams.

contact sales add-on
Clari Listen
Clari Inc.

Clari's revenue-team conversation-listening posture across signals.

contact sales
Revenue.io
Revenue.io Inc.

Sales engagement and conversation intelligence platform.

contact sales
Groove.co GenAI
Groove.co (Clari)

Groove sales engagement platform's generative-AI feature set.

contact sales
Equilar BoardEdge
Equilar Inc.

Executive-meeting intelligence for IR and board-engagement teams.

contact sales
Diligent Boards AI
Diligent Corporation

Board-meeting AI summaries and governance intelligence.

contact sales
Nasdaq Boardvantage
Nasdaq Inc.

Board portal with AI-assisted minutes and meeting recap.

contact sales
Synthflow AI
Synthflow AI

No-code voice-AI platform for inbound and outbound phone calls.

from $29/mo
Hiver Harvey AI
Hiver Technologies

Email-and-chat customer-service AI inside Hiver shared inbox.

from $34/user/mo
Front AI
Front App Inc.

AI inside Front's customer-communication platform.

from $19/user/mo
Intercom Fin
Intercom Inc.

Generative-AI agent that resolves customer-support conversations.

$0.99/resolution
Zendesk AI
Zendesk Inc.

Generative AI inside the Zendesk customer-service platform.

$50/agent/mo add-on
Service Cloud Einstein
Salesforce Inc.

Salesforce's AI for customer-service and contact centers.

contact sales
Alex Anywhere
Alex Anywhere

macOS AI agent that captures and summarizes any audio on the system.

from $20/mo
Voicenotes
Voicenotes

iOS / Android voice-first note-taking app with AI summaries.

from $10/mo
AudioPen
AudioPen

Voice-to-clean-text app for capturing rambling thoughts.

free / $99/yr
Good Tape
Zetland Media (Good Tape)

Online transcription service from a journalism nonprofit.

free / from $15/mo
Transkriptor
Transkriptor

Multilingual file transcription with summarization and meeting-bot.

free / from $4.99/mo
Happy Scribe
Happy Scribe

Automatic and human-corrected transcription and subtitling.

from €0.20/min
Sonix
Sonix Inc.

Automatic transcription, translation, and subtitling.

from $10/hr
Steno
Steno Agency Inc.

Court-reporting and legal deposition platform with AI transcripts.

contact sales
Dragon Medical One
Microsoft (Nuance)

Cloud-based clinical speech recognition for clinical documentation.

contact sales
Heidi Health
Heidi Health Pty Ltd.

AI medical scribe for clinicians in 30+ countries.

free / from $99/mo
Scribeberry
Scribeberry Inc.

Canadian AI medical scribe for primary care physicians.

contact sales
Hubilo
Hubilo Softech Inc.

Virtual and hybrid event platform with AI recap.

contact sales
vFairs
vFairs Inc.

Virtual event platform with AI session summaries.

contact sales
Airmeet
Airmeet Inc.

Virtual and hybrid event platform with engagement and AI features.

from $4,000/event
Kaltura Events
Kaltura Inc.

Enterprise video platform with event broadcasting and AI captions.

contact sales
ZoomInfo Engage
ZoomInfo Technologies

Sales engagement (cadence + dialer) inside ZoomInfo.

contact sales
Scratchpad
Scratchpad Inc.

Lightning-fast revenue workspace for Salesforce updates and notes.

free / from $19/user/mo
BrightHire
BrightHire Inc.

Interview intelligence for talent-acquisition teams.

contact sales
Metaview
Metaview Ltd.

AI interviewer-notes platform for recruiting teams.

contact sales
Hireflix
Hireflix

One-way video interview platform with AI transcription.

from $150/mo
hireEZ Vue
hireEZ Inc.

Outbound recruiting platform with AI sourcing and engagement.

contact sales
Paradox Olivia
Paradox Inc.

Conversational-AI recruiter for high-volume hiring.

contact sales
Kaltura Classroom
Kaltura Inc.

Lecture-capture and transcription for higher education.

contact sales
Panopto
Panopto Inc.

Lecture-capture, video CMS, and AI transcription for education.

contact sales
Echo360
Echo360 Inc.

Active-learning and lecture-capture platform for higher education.

contact sales
Yoodli
Yoodli Inc.

AI public-speaking coach with private practice mode.

free / from $9/mo
Praktika
Praktika.ai

AI English-language tutor with voice conversations.

from $11.99/mo
Speak
Speakeasy Labs Inc.

AI English-speaking practice app backed by OpenAI.

from $19.99/mo
Riverside
Riverside.fm Inc.

Studio-quality remote recording for podcasts and video.

free / from $19/mo
SquadCast
SquadCast.fm (Descript)

Lossless remote-recording platform for podcasters.

from $20/mo
Zencastr
Zencastr Inc.

Browser-based podcast and video recording with AI features.

free / from $20/mo
Supercreator AI
Supercreator AI Ltd.

AI app for shorts/reels creators with auto-captioning.

free / from $19.99/mo
Rev Voice Recorder
Rev.com Inc.

Rev.com's iOS / Android voice recorder with one-tap transcription.

free / pay-per-transcript
Just Press Record
Open Planet Software

iOS / Apple Watch voice recorder with on-device transcription.

$4.99 one-time
HubSpot AI Content Assistant
HubSpot Inc.

Generative-AI features across HubSpot marketing and sales tools.

bundled with HubSpot
Zoho Zia for Meetings
Zoho Corporation

Zoho's AI meeting summaries inside Zoho Meeting.

bundled from $1/host/mo
Microsoft Teams Premium
Microsoft Corporation

Premium Teams tier with intelligent recap and AI features.

$10/user/mo add-on
Workspace Duet AI
Google LLC

Original branding for Google's Workspace AI before Gemini rebrand.

bundled with Workspace
Krisp Meetings
Krisp Technologies Inc.

Krisp's combined audio-cleanup-plus-meeting-notes product.

bundled with Krisp
Nyota AI
Nyota AI

AI meeting assistant with team-knowledge questions across past meetings.

from $19/mo
Krisp Federal
Krisp Technologies Inc.

FedRAMP-aligned tier of Krisp for US government use.

contact sales
Wudpecker
Wudpecker Oy

Privacy-first AI meeting notetaker.

free / from $15.99/user/mo
Speak Ai
Speak Ai Inc.

Qualitative-research AI for transcribing and analyzing interviews.

from $19/mo
Dovetail
Dovetail Research

Research repository with AI analysis of qualitative data.

free / from $30/user/mo
Notably
Notably Inc.

Visual-canvas research-analysis tool with AI transcription.

free / from $25/mo
Condens
Condens GmbH

European qualitative-research analysis platform with AI features.

contact sales
User Interviews
User Interviews Inc.

Participant-recruitment platform with AI session features.

contact sales
Respondent
Respondent.io Inc.

B2B participant-recruitment platform for user research.

pay-per-participant
UserTesting
UserTesting Inc.

User-testing platform with AI insights and transcription.

contact sales
Lookback
Lookback Inc.

Live and async user-testing platform for product teams.

from $25/mo
PlaybookUX
PlaybookUX Inc.

Remote-user-research platform with AI session analysis.

pay-per-session
Orum
Orum Inc.

Parallel dialer for outbound sales with conversation analytics.

contact sales
Nooks
Nooks Inc.

AI-powered parallel dialer and sales platform.

contact sales
Kixie PowerCall
Kixie Inc.

Cloud-phone and SMS platform with AI features for sales teams.

from $35/user/mo
Dialpad Meetings
Dialpad Inc.

Video-conferencing product from Dialpad with AI recap.

free / from $15/user/mo
Uniphore
Uniphore Technologies Inc.

Conversational AI for customer experience across voice and video.

contact sales
Verint Customer Engagement
Verint Systems Inc.

Customer-engagement and contact-center AI platform.

contact sales
Calabrio ONE
Calabrio Inc.

Workforce management and contact-center AI suite.

contact sales
CallRail
CallRail Inc.

Call-tracking and AI conversation-intelligence for SMB.

from $50/mo
Ema
Ema Unlimited Inc.

Universal AI employee for enterprise workflows.

contact sales
Decagon
Decagon AI Inc.

Generative-AI customer-support agents.

contact sales
Sierra
Sierra.ai Inc.

Conversational-AI platform for consumer brands.

contact sales
Instaminutes
Instaminutes

AI meeting summarizer for short, accurate recaps.

free / from $9.99/mo
MeetingPulse
MeetingPulse Inc.

Audience-engagement and Q&A platform with AI summaries.

free / from $39/mo
Slido
Cisco Webex (Slido)

Audience-interaction platform with AI summary of meeting Q&A.

free / from $15/mo
Airtame Rooms
Airtame ApS

Hardware-based meeting-room solution with AI features.

hardware + subscription
Logitech Rally Bar AI
Logitech International

Logitech meeting-room camera with AI features for hybrid meetings.

hardware purchase
Neat
Neat AS

All-in-one video bar and meeting-room hardware.

hardware purchase
Castmagic
Castmagic

Turn long-form audio into show notes, clips, tweets, and newsletters in one upload.

paid
Podcastle
Podcastle AI

Browser DAW for podcasts with AI voice clones and one-click cleanup.

freemium
Cleanvoice.ai
Cleanvoice

Automatic remover of ums, mouth sounds, dead air, and stutters from podcast tracks.

paid
Auphonic
Auphonic

Automatic audio leveling, loudness normalization, and noise reduction for podcast post.

freemium
Adobe Podcast Enhance
Adobe

Free web tool that makes any voice recording sound like it was tracked in a studio.

freemium
Krisp
Krisp Technologies

Real-time noise, voice, and echo cancellation on any call or recording.

freemium
Alitu
The Podcast Host

Hands-off podcast maker — drag in raw tracks, get a leveled, intro-stitched episode.

paid
Resonate Recordings
Resonate Recordings

Human-edited podcast post-production with optional AI assist.

paid
Buzzsprout
Buzzsprout

Podcast host with built-in AI transcripts, magic mastering, and episode chapters.

freemium
Spotify for Podcasters
Spotify

Spotify's free podcast host with AI Voice Translation and auto-transcripts.

free
Headliner
SpareMin

Turn podcast audio into shareable audiograms and waveform videos.

freemium
Wavve
Wavve

Audiogram generator with templated designs for podcast promo clips.

paid
Castos
Castos

Podcast host with WordPress integration and private feed support.

paid
Transistor.fm
Transistor

Modern podcast host built for networks — unlimited shows on one account.

paid
Captivate.fm
Captivate

Growth-focused podcast host with marketing tools built in.

paid
Podbean
Podbean

Podcast host with live streaming, monetization, and AI episode notes.

freemium
Spreaker
iHeartMedia

Podcast host and live audio platform owned by iHeart with a programmatic ad network.

freemium
Acast
Acast

Enterprise podcast host with monetization and global ad sales.

paid
Megaphone
Spotify

Spotify's enterprise podcast publishing and ad-insertion platform.

paid
Simplecast
SiriusXM

Podcast host owned by SiriusXM with strong embeddable players.

paid
Libsyn
Libsyn

The original podcast host, still running shows that started in the 2000s.

paid
Pinecast
Pinecast

Indie-favorite podcast host with technical features and fair pricing.

freemium
RedCircle
RedCircle

Free podcast host with cross-promotion and host-read ad network.

free
Veed.io
Veed

Browser video editor with auto-subtitles in 100+ languages.

freemium
Pictory
Pictory

Turn blog posts and long videos into branded short clips with auto-captions.

paid
Vidcap
Vidcap

Bulk subtitle and caption generator for video creators.

paid
Captions
Captions

Mobile-first captioning app for creators recording on a phone.

paid
Munch
Munch

Long-form video → short-form clips with social-trend awareness.

paid
Klap
Klap

AI shorts generator — paste a YouTube URL, get vertical clips with captions.

freemium
Vizard
Vizard

AI clipping tool with face-tracking auto-reframe.

freemium
ChopCast
ChopCast

Repurpose long webinars and podcasts into clips, blog posts, and quote graphics.

paid
Zubtitle
Zubtitle

Add captions, headlines, and resize video for social — one upload.

paid
Kapwing
Kapwing

Browser video editor with strong meme, subtitle, and team-collaboration tooling.

freemium
ScriptMe
ScriptMe

AI transcription + caption studio with translation in 90+ languages.

paid
Reduct.video
Reduct

Transcript-based video editing for research, user interviews, and journalism.

paid
Vimeo Create
Vimeo

AI video creation and captioning inside the Vimeo platform.

freemium
Animoto AI
Animoto

Slide-based video creation with AI scripts and captioning.

freemium
Lumen5
Lumen5

Convert blog posts to social videos with auto-captions and stock B-roll.

freemium
InVideo AI
InVideo

Prompt-to-video generator with stock library and AI voiceover.

freemium
Steve.ai
Steve.ai

AI text-to-animation and live-action video generator.

freemium
HeyGen
HeyGen

AI avatar video generator with talking-head cloning and 100+ language dubs.

freemium
Highlight by Marvin
Marvin

Find and clip the most highlight-worthy moments from podcast and interview footage.

paid
Repurpose.io
Repurpose.io

Automated pipeline to publish one video to every social platform.

paid
Tweet Hunter Repurpose
Lempire

AI-driven Twitter / X scheduling with video-clip-to-thread repurposing.

paid
Crayo
Crayo

AI clip-generator targeting faceless TikTok and Reels.

freemium
Rev Captions
Rev

Rev's human-grade captioning service with SRT/VTT/SCC delivery.

paid
3Play Media
3Play Media

Enterprise captioning, transcription, and audio-description service.

paid
Amara
Participatory Culture Foundation

Open subtitling platform run by the Participatory Culture Foundation.

freemium
Subly
Subly

Subtitle, translate, and edit captions for social video in 70+ languages.

freemium
Maestra AI
Maestra

Transcription, subtitling, voiceover, and dubbing in 125+ languages.

paid
Notta Translate
Notta

Real-time meeting and video translation across 50+ languages.

freemium
Cielo24
Cielo24

Enterprise captioning and video data services for media and education.

paid
StreamText
StreamText

Live captioning delivery network for events and webinars.

paid
SubtitleBee
SubtitleBee

Browser auto-subtitle and translation tool for social video.

freemium
Type Studio
Type Studio

Transcript-based video editor with multi-language subtitle output.

freemium
Auphonic Multitrack
Auphonic

Multi-track leveling and crosstalk reduction for podcast editors.

freemium
Castmagic Clips
Castmagic

Castmagic's short-form clip generator built on its podcast transcript output.

paid
Castos Repurpose
Castos

Castos host's AI clip + show-note generator for hosted episodes.

paid
Tweet Hunter Repurpose Mode
Lempire

Drop a video URL → get pre-drafted X threads in your voice.

paid
Castmagic Newsletter
Castmagic

Auto-drafted podcast-driven email newsletters.

paid
Limecraft
Limecraft

Production management with AI transcription for broadcast workflows.

paid
TranscribeMe
TranscribeMe

Human-grade transcription service with HIPAA + enterprise options.

paid
Scriptix
Scriptix

Speech-to-text platform aimed at broadcast and media in EU markets.

paid
Noteit.ai
Noteit

Meeting bot with focused share-with-anyone summary links.

freemium
Happyscribe Subtitle Editor
Happyscribe

Frame-accurate subtitle editor inside the Happyscribe transcription product.

freemium
EZsub
EZsub

Cloud subtitle workflow for educators and small media teams.

paid
Checksub
Checksub

AI subtitle and dubbing platform with team workflows.

paid
Splento AI
Splento

Event video service with AI editing and captioning.

paid
Vodalus
Vodalus

AI workflow for broadcast captioners and accessibility teams.

paid
Rev Translated Captions
Rev

Rev's translated subtitle service across 16+ language pairs.

paid
DubDub.ai
DubDub

AI dubbing platform with lip-sync for video translation.

paid
Rask AI
Rask AI

AI video translation and dubbing platform.

paid
ElevenLabs Dubbing
ElevenLabs

ElevenLabs' video dubbing layer on top of its voice model.

paid
Blakify
Blakify

AI voiceover platform with multi-language output for podcasts and videos.

paid
Play.ht
PlayHT

AI voice generator favored for podcast and audiobook workflows.

paid
WellSaid Labs
WellSaid Labs

Enterprise AI voiceover platform with consented voice avatars.

paid
Murf AI
Murf AI

Studio-grade AI voice generation with timeline editor.

freemium
Speechgen.io
Speechgen

Cheap AI voice generator with broad language coverage.

paid
ElevenLabs
ElevenLabs

Studio-grade AI voice generation and cloning.

freemium
Vidnoz AI
Vidnoz

AI avatar video generator with talking-head templates.

freemium
Synthesia
Synthesia

Enterprise AI avatar video platform with translation.

paid
DeepBrain AI Studios
DeepBrain AI

AI avatar video platform popular in APAC enterprises.

paid
Elai.io
Elai.io

AI avatar video tool with text-to-video and templates.

paid
Colossyan
Colossyan

Multi-avatar dialogue video for corporate L&D.

paid
Veed Auto-Subtitles
Veed

Veed.io's standalone auto-subtitle workflow.

freemium
Kapwing Subtitler
Kapwing

Kapwing's stand-alone subtitle workflow.

freemium
Transkriptor
Transkriptor

Affordable AI transcription with strong Turkish and European-language coverage.

paid
SpeechTexter
SpeechTexter

Free browser-based dictation tool using browser speech APIs.

free
Podscribe
Podscribe

Podcast measurement and attribution platform — transcripts power the analytics.

paid
Magellan AI
Magellan AI

Podcast advertising intelligence built on transcript analysis.

paid
Podchaser Insights
Acast

Podcast database with transcript-powered creator and listener intelligence.

freemium
Audiogram by Headliner
SpareMin

Headliner's automated audiogram workflow for podcast snippets.

freemium
Veed Live Captions
Veed

Veed's live caption layer for streaming and recording.

freemium
Loom AI Captions
Loom (Atlassian)

Loom's auto-caption layer for screen recordings.

freemium
Vidyo.ai
Vidyo.ai

AI clipping and short-form video repurposing for creators.

freemium
YouTube Studio Auto-Captions
Google

YouTube Creator Studio's built-in auto-caption track.

free
Vimeo AI Captions
Vimeo

Vimeo's auto-caption layer for hosted videos.

freemium
Panopto ASR
Panopto

Panopto's auto-captioning for higher-education and enterprise video.

paid
Kaltura REACH
Kaltura

Kaltura's enterprise auto-captioning for video platforms.

paid
Rev Live Captions
Rev

Rev's live captioning service for Zoom and Webex.

paid
LEXI Live
AI-Media

AI-Media's live broadcast captioning engine.

paid
Spinach.io
Spinach

AI standup notetaker for engineering teams.

freemium
Krisp AI Meeting Notes
Krisp Technologies

Krisp's AI meeting-notes layer paired with its noise cancellation.

freemium
Rev AI Notetaker
Rev

Rev's meeting-bot notetaker built on Rev AI transcription.

freemium
Krisp Live Captions
Krisp Technologies

Krisp's on-device live caption feature for calls.

freemium
Cogi
Cogi

Mobile voice notetaker with retroactive recording.

freemium
Otter Meeting GenAI
Otter.ai

Otter's question-and-answer layer over recorded meetings.

freemium
Krisp Voice Privacy
Krisp Technologies

Krisp's voice privacy filter for call agents.

paid
Fireflies Soundbites
Fireflies.ai

Fireflies' clip-extraction layer for meeting highlights.

freemium
Magic Mastering by Buzzsprout
Buzzsprout

Buzzsprout's one-click mastering for hosted episodes.

paid
Blubrry
RawVoice

Long-running podcast host with WordPress PowerPress plugin.

paid
Fusebox
Fusebox

Embeddable podcast player and player-network host.

paid
RSS.com
RSS.com

Affordable podcast host with monetization and AI features.

freemium
Transistor Private Podcasts
Transistor

Transistor's private podcast feature for B2B and internal podcasts.

paid
Supercast
Supercast

Premium podcast subscription platform with private RSS.

paid
Anchor (legacy)
Spotify

Spotify's prior podcast brand, now redirected to Spotify for Podcasters.

free
Soundtrap
Spotify

Spotify's browser DAW with collaborative podcast tools.

freemium
Soundwise
Soundwise

Audio learning and private podcast platform.

paid
Fable Studio
Fable Studio

AI tools for emotive voice and character dialogue.

paid
Boombox Studio
Boombox.io

AI sound design and audio post platform.

freemium
Soundverse
Soundverse

AI music and sound bed generator for video creators.

paid
Musicfy
Musicfy

AI music + voice generation aimed at creator workflows.

paid
Suno
Suno

AI song generation with vocals and instruments from a prompt.

freemium
Udio
Udio

AI song generation with prompt-driven vocals and arrangement.

freemium
Soundraw
Soundraw

Royalty-free AI music for creators and video producers.

paid
Videoleap
Lightricks

Lightricks' mobile video editor with AI features.

freemium
CapCut
ByteDance

ByteDance's free cross-platform editor with auto-captions and AI tools.

freemium
Lightcut
Ulike Tech

Mobile AI editor for travel and creator video.

freemium
Microsoft Clipchamp
Microsoft

Microsoft's browser video editor with auto-captions and Speaker Coach.

freemium
Wisecut
Wisecut

AI video editor that cuts silences and auto-captions long-form video.

freemium
Speakflow
Speakflow

Teleprompter app for creators recording on a phone or laptop.

freemium
Nuance Dragon Medical One
Microsoft / Nuance

Cloud-based medical speech recognition for clinicians, owned by Microsoft.

enterprise · contact sales
Microsoft DAX Copilot
Microsoft / Nuance

Ambient AI clinical documentation — listens to the visit, writes the note.

enterprise · contact sales
Iris Medical
Iris Medical

Vendor-specific AI scribe — verify availability and BAA status before relying on it.

unknown · contact vendor
Nabla Copilot
Nabla Technologies

Ambient AI assistant for clinicians — Paris-founded, used in EU and US.

free trial / paid plans
Tali AI
Tali AI

Canadian AI scribe and medical voice assistant for primary care.

paid plans · contact sales
Robin Healthcare
Robin Healthcare

AI medical scribe with virtual scribe overlay for orthopedics and specialty care.

enterprise · contact sales
Sayvant
Sayvant

Generative AI scribe targeted at emergency medicine and urgent care.

contact for pricing
Doximity Scribe
Doximity

Free AI medical scribe inside the Doximity clinician network app.

free for verified clinicians
Mutuo Health
Mutuo Health Solutions

AI clinical documentation for Canadian and Australian healthcare.

contact for pricing
Sully AI
Sully AI

AI clinical assistant that handles documentation, coding, and tasks for clinicians.

paid plans · contact sales
Lindy Health
Lindy

Healthcare-vertical instance of the Lindy AI agent platform.

paid plans
Noted AI
Noted AI

AI scribe and note-generation product for clinicians.

contact for pricing
Saykara (legacy)
Nuance (Microsoft)

Early ambient AI scribe — acquired by Nuance in 2021.

discontinued — see DAX Copilot
Modernizing Medicine ScribeAI
Modernizing Medicine

AI scribing built into the ModMed EMA EHR for specialty practices.

EMA add-on · contact sales
eClinicalWorks Scribe (Sunoh.ai)
eClinicalWorks

AI medical scribe bundled with the eClinicalWorks EHR.

EHR add-on · contact sales
athenahealth AI Notes
athenahealth

AI documentation features inside the athenahealth EHR.

EHR add-on · contact sales
Epic Stage / In Basket AI
Epic Systems

AI documentation features built into the Epic EHR.

EHR add-on · contact sales
AVA Health (formerly AVA Speech)
AVA Inc.

Healthcare instance of AVA's live-captioning platform.

contact for pricing
Veritone Legal (Illuminate)
Veritone Inc.

AI-powered legal transcription and evidence-search platform.

enterprise · contact sales
vTestify
vTestify

Remote deposition and video testimony platform with built-in transcription.

per-deposition pricing
Otter for Education
Otter.ai

Education tier of Otter.ai — classroom and lecture transcription.

edu discount · from $4.99/mo student
Glean (Sonocent successor)
Glean Education

Note-taking app for students that records the lecture and structures the notes.

from ~$10/mo · institutional plans
Notability Audio Recording
Ginger Labs

Audio recording inside the Notability note-taking app on Apple devices.

from $14.99/yr
AudioNote
Luminant Software

Linked-notes-and-audio app for iOS, Android, Windows, and Mac.

from $9.99 one-time / subscription
Ava for Education
AVA Inc.

AVA's classroom and group-discussion live-captioning product.

institutional pricing
Microsoft Translator (Education)
Microsoft

Free Microsoft live-translation app with classroom mode for teachers.

free
Google Live Transcribe (Education)
Google

Free Android live-transcription app used in education accessibility settings.

free
VoicePad Pro
Mobile-Magic

Education and accessibility dictation product.

paid one-time
NVivo Transcription
Lumivero

AI transcription add-on for the NVivo qualitative analysis software.

credits · ~$0.90/min
ATLAS.ti AI Transcription
ATLAS.ti Scientific Software GmbH

AI transcription inside the ATLAS.ti qualitative research software.

license · AI credits
Trint Education
Trint

Education tier of the Trint AI transcription platform.

edu pricing · contact sales
Sonix for Education
Sonix, Inc.

Education tier of Sonix's AI transcription platform.

edu pricing · per-hour credits
Ava
AVA Inc.

Live captioning app for deaf and hard-of-hearing users — group conversations.

free / paid plans
Google Live Caption
Google

System-wide live captions on Android, Chrome, and Pixel devices.

free · built into OS
Microsoft Live Captions
Microsoft

Windows 11 system-wide live captioning for any audio on the device.

free · built into Windows 11
Apple Live Captions
Apple

System-wide live captions on iOS, iPadOS, and macOS.

free · built into OS
RogerVoice
RogerVoice

Live captioning and relay calling app for deaf and hard-of-hearing users.

free / paid plans
MMR Translate
MMR Translate

Live captioning and translation product targeted at conferences and events.

per-event pricing
SpeakSee
SpeakSee

Multi-microphone live captioning hardware + app for the deaf community.

hardware + subscription
InnoCaption
InnoCaption

Free real-time captioned telephone service for hard-of-hearing US users.

free · FCC-funded
Olelo
Olelo

Multilingual live captioning and translation product (verify availability).

contact vendor
Hello Eye / Hello AI
Hello Eye

Smart-glasses live captioning product (verify availability).

contact vendor
XREAL / Nreal AR Captions
XREAL

AR-glasses live captioning experiences from XREAL (formerly Nreal).

hardware $399+
Sorenson Buzz Cards / Apps
Sorenson Communications

Apps and tools for the deaf community from Sorenson Communications.

free for verified deaf users
EEG Enterprises iCap / Falcon
EEG Enterprises (Ai-Media)

Broadcast captioning hardware, encoders, and AI captioning software.

enterprise · contact sales
Caption.Ed
Caption.Ed

Lecture-capture and live-captioning tool for higher education and workplaces.

institutional pricing
Temi
Rev

Rev's AI-only $0.25/min transcription product.

$0.25/min
Transcribe by Wreally
Wreally

Manual transcription playback tool for journalists and researchers.

from $20/yr
Whisper Memos
Whisper Memos

Apple-platform voice memo app that uses OpenAI Whisper for transcription.

from $2.99/mo
Audio2Text (Speech-to-Text apps)
Various publishers

Generic name for several mobile audio-to-text apps.

free / paid plans
iA Writer Dictation
iA Inc.

Voice dictation inside the iA Writer minimalist writing app.

from $9.99 one-time
Dragon Anywhere
Microsoft / Nuance

Mobile dictation app from Nuance for iOS and Android.

from $14.99/mo
Speechnotes
Speechnotes

Free browser-based voice typing notepad.

free
Dictation.io
Digital Inspiration

Free browser dictation tool with simple UI.

free
Apple Voice Control
Apple

Apple's accessibility voice control on macOS, iOS, and iPadOS.

free · built into OS
Apple Dictation
Apple

Built-in OS dictation across iOS, iPadOS, and macOS.

free · built into OS
Windows Speech Recognition
Microsoft

Built-in Windows dictation and OS voice control.

free · built into Windows
Windows Voice Access
Microsoft

Modern AI-driven dictation and voice OS control built into Windows 11.

free · built into Windows 11
Google Voice Typing (Docs)
Google

Voice typing inside Google Docs and Google Slides.

free · built into Google Docs
Gboard Voice Typing
Google

Voice typing in the Gboard keyboard on Android and iOS.

free · built into Gboard
ListNote
Khimaera

Free Android voice-typing notepad.

free
Voice Notebook
Voice Notebook

Web and Android voice typing tool with translation.

free / paid plans
EzDictation
EzDictation

Free browser dictation utility.

free
AudioPen
AudioPen

Speak-to-edit app that turns ramble into clean structured text.

free / from $6.50/mo
Otter Bot Voice
Otter.ai

Otter.ai's mobile voice-recording surface.

free / paid plans
SpeakNotes
Various publishers

Mobile voice-to-text note app.

free / paid plans
VoiceIn Voice Typing
Dictanote

Chrome extension that adds voice typing to any text field on any website.

free / Pro $14.99/mo
Dictanote
Dictanote

Web-based dictation notepad with cloud sync.

free / Pro $14.99/mo
TypeTalk
Various publishers

Mobile dictation utility (verify publisher).

free / paid plans
Notta Dictate
Notta

Notta's dictation feature for voice typing on mobile.

from $13.99/mo
Readwise Voice Memos
Readwise

Voice-memo capture in the Readwise reading and knowledge product.

from $7.99/mo
Microsoft Dictate (Office 365)
Microsoft

Voice dictation inside Word, Outlook, PowerPoint, and OneNote.

Microsoft 365 subscription
Trint for Newsrooms
Trint

Trint's newsroom-targeted product tier.

enterprise · contact sales
Reduct Newsroom
Reduct.Video

Reduct.Video's newsroom tier for journalism teams.

contact for pricing
ScribeUp
ScribeUp

Web transcription service for journalists.

per-hour rates
Verbz
Verbz

Mobile-first AI transcription app for journalists and creators.

free / Pro plans
Sonix for Newsrooms
Sonix, Inc.

Sonix's newsroom tier targeted at broadcast and digital news.

per-hour credits
Scribee
Scribee

Mobile transcription product for journalists.

free + IAP
Descript for Journalism
Descript

Descript's journalism use case (existing tool entry for reference).

see descript entry
Recordr / Recordly
Various publishers

Mobile recording-and-transcription apps used by journalists.

free / paid plans
Simbo AI
Simbo AI

AI medical voice assistant — appointment booking and clinical voice tasks.

enterprise · contact sales
NoteTracker
NoteTracker

Medical transcription compliance and audit tool.

contact for pricing
VoiceTag
Various publishers

Medical voice macros and dictation utility (verify publisher).

varies
e-MDs Dictation
CompuGroup Medical

Dictation built into the e-MDs / CompuGroup ambulatory EHR.

EHR add-on
Greenway Intergy Dictation
Greenway Health

Dictation features inside the Greenway Health EHRs.

EHR add-on
Tebra (Kareo) Voice Notes
Tebra

Voice notes inside the Tebra (Kareo + PatientPop) EHR.

EHR add-on
DrChrono AI Scribe Marketplace
DrChrono (EverHealth)

AI scribe partnerships inside the DrChrono EHR marketplace.

EHR add-on
Praxis EMR (text-templates)
Infor-Med

Praxis EMR's concept-based note generation — adjacent to voice dictation.

EHR license
Ambient Clinical Intelligence (M*Modal)
Solventum (3M)

Solventum (3M) ambient clinical intelligence built on the M*Modal engine.

enterprise · contact sales
3M Fluency Direct
Solventum (3M)

Clinical front-end speech recognition product.

enterprise · contact sales
InScribe (Higher Ed)
InScribe

Higher-ed support platform with transcription features (verify).

institutional pricing
Kaltura REACH
Kaltura

Education video platform's transcription and captioning add-on.

enterprise · contact sales
Panopto Captions
Panopto

Captioning and transcription built into the Panopto video platform for education.

institutional pricing
YuJa Captioning
YuJa

AI captioning inside the YuJa video platform for higher education.

institutional pricing
Echo360 Captioning
Echo360

Captioning inside the Echo360 lecture-capture platform.

institutional pricing
Zoom Live Transcription
Zoom Communications

Built-in live transcription and captioning in Zoom meetings.

free for paid Zoom plans
Microsoft Teams Transcription
Microsoft

Built-in live transcription and meeting recap in Microsoft Teams.

Microsoft 365 subscription
Google Meet Captions
Google

Built-in live captions and Gemini-powered notes in Google Meet.

Workspace subscription
Webex Real-time Captions
Cisco

Live captions and post-meeting transcript in Cisco Webex.

Webex subscription
Riverside.fm Transcripts
Riverside

Automatic transcripts inside the Riverside.fm podcast and video studio.

from $15/mo
Podcastle Transcription
Podcastle

Built-in transcription in the Podcastle podcast creation platform.

from $11.99/mo
Philips SpeechLive
Philips Speech

Cloud dictation and transcription workflow for legal and corporate dictation.

from ~$15/mo per seat
Olympus Dictation Management System
Olympus / OM System

Olympus's professional dictation workflow software for legal and medical.

license + maintenance
Yitu Speech
Yitu Technology

Yitu Tech speech recognition — Mandarin, dialects, and far-field microphone arrays.

enterprise · contact sales
AmiVoice
Advanced Media

Advanced Media's AmiVoice — Japan's longest-running enterprise speech recognition family.

tiered · cloud API pay-as-you-go + on-premise license tiers
Fujitsu LiveTalk
Fujitsu

Fujitsu LiveTalk — Japanese real-time captioning for meetings and classrooms.

enterprise · contact sales
Selvas Selvy STT
Selvas AI

Selvas AI's Selvy speech recognition — Korean and English ASR for media, finance, and government.

enterprise · contact sales
Skit.ai
Skit.ai

Skit.ai (formerly Vernacular.ai) — voice AI for collections and Indian-language call automation.

enterprise · contact sales
Slang Labs
Slang Labs

Slang Labs — in-app multilingual voice assistant SDK with Indic-language ASR.

tiered · contact sales for current plans
MTS AI VoiceTech
MTS AI

MTS AI VoiceTech — Russian-language ASR and voice biometrics from telecom operator MTS.

enterprise · contact sales
Speech Technology Center (STC)
Speech Technology Center

STC (Sankt-Peterburg) — long-running Russian speech and biometrics vendor, formerly STC-innovations.

enterprise · contact sales
VoiceInteraction VoxSigma
VoiceInteraction

VoiceInteraction VoxSigma — Portuguese-strong multilingual broadcast transcription.

enterprise · contact sales
Vocapia VoxSigma
Vocapia Research

Vocapia Research VoxSigma — multilingual broadcast and call-centre transcription from LIMSI heritage.

enterprise · contact sales
Verbio
Verbio Technologies

Verbio Technologies — Spanish-strong multilingual ASR and voice biometrics from Barcelona.

enterprise · contact sales
Phonexia
Phonexia

Phonexia — Czech speech and voice-biometrics vendor focused on government and forensic use.

enterprise · contact sales
Tilde ASR
Tilde

Tilde — Baltic-language NLP vendor with Latvian, Lithuanian, and Estonian speech recognition.

enterprise · contact sales
Lingsoft
Lingsoft

Lingsoft — Finnish language-services group offering Nordic-language ASR and dictation.

enterprise · contact sales
Speechmore
Speechmore

Speechmore — Italian-language transcription product for journalists and professionals.

tiered · published per-minute and subscription plans on site
Vocally
Vocally

Vocally — French-first transcription tool for journalists and researchers.

tiered · per-minute pricing in EUR
VocTroLabs
VocTroLabs

VocTroLabs — speech and subtitling research lab from Universitat Politècnica de Catalunya.

enterprise · contact sales
Cedat 85
Cedat 85

Cedat 85 — Italian speech-to-text and stenography vendor for parliaments and media.

enterprise · contact sales
Rumi Arabic Speech
Rumi

Rumi — Arabic-first transcription product tuned for Egyptian, Levantine, and Gulf dialects.

tiered · per-minute pricing in USD
Mawdoo3 / Mowjaz
Mawdoo3

Mawdoo3 — Jordan-based Arabic AI lab building Salma voice assistant and ASR research.

enterprise · contact sales
Kalam.ai
Kalam.ai

Kalam — Arabic speech-to-text product for journalists and researchers.

tiered · per-minute Arabic transcription pricing
VoxLab
VoxLab

VoxLab — Brazilian Portuguese speech recognition for journalists, agencies, and businesses.

tiered · per-minute pricing in BRL
TranscribeMe LATAM
TranscribeMe

TranscribeMe Latin-America operations — Spanish-and-Portuguese human + automatic transcription.

tiered · per-minute pricing in USD
Voiceitt
Voiceitt

Voiceitt — non-standard-speech recognition for people with speech impairments.

tiered · subscription pricing on site
VoxQube
VoxQube

VoxQube — Urdu and South-Asian language transcription research and services.

enterprise · contact sales
Navana Tech
Navana Tech

Navana Tech — Bengali-language conversational AI and ASR from Bangladesh.

enterprise · contact sales
ASR Yandex on-prem
Yandex Cloud

Yandex SpeechKit on-premise build — Russian-language ASR for isolated networks.

enterprise · contact sales
Tarteel AI
Tarteel AI

Tarteel — Qur'anic Arabic recitation recognition for memorisation and tajweed feedback.

tiered · freemium with premium subscription on app stores
Armada AI
Armada AI

Armada — Russian-language conversational and contact-centre AI with embedded ASR.

enterprise · contact sales
Speakable PT
Speakable

Speakable — European-Portuguese-tuned transcription for Iberian newsrooms.

tiered · published per-minute pricing in EUR
Bertin IT
Bertin IT

Bertin IT MediaSpeech — French defence and intelligence multilingual transcription suite.

enterprise · contact sales
Syllable Health (multilingual)
Syllable

Syllable — patient-facing healthcare voicebots with Spanish, English, and Mandarin support.

enterprise · contact sales
Convai LATAM
LATAM voice-AI vendors

Convai-style Spanish voice automation tuned for Mexican and Central-American customer service.

enterprise · contact sales
Phonexia LATAM
Phonexia

Phonexia's Latin-American operations — Spanish-language voice biometrics and STT.

enterprise · contact sales
Vivoka
Vivoka

Vivoka — French embedded voice-AI vendor with offline multilingual ASR.

enterprise · contact sales
Speak AI Arabic
Speak AI

Speak AI — Arabic-and-multilingual research-grade transcription with NLP insights.

tiered · subscription plans in USD
Rakuten AIris
Rakuten

Rakuten AIris — Japanese-language speech recognition from Rakuten Institute of Technology.

enterprise · contact sales
Onsei
Onsei

Onsei — Japanese-language web transcription for individual professionals and researchers.

tiered · per-minute pricing in JPY
Air.ai
Air AI

Conversational AI claimed to hold 10-40 minute human-like phone calls.

see vendor pricing
PolyAI
PolyAI

Enterprise voice assistants for contact centers in hospitality, banking, and retail.

see vendor pricing
Stack AI
Stack AI

No-code platform for building AI workflows including voice agents and assistants.

see vendor pricing
Replicant
Replicant

Contact-center voice AI that autonomously resolves routine customer calls.

see vendor pricing
Hyro
Hyro

Plug-and-play conversational AI for healthcare and enterprise voice + chat.

see vendor pricing
Kore.ai Voice
Kore.ai

Enterprise conversational AI platform with first-class voice and IVR support.

see vendor pricing
Cognigy Voice
Cognigy

Enterprise contact-center voice AI from Cognigy.AI conversational platform.

see vendor pricing
Mindsay
Mindsay

Conversational AI for travel and customer service automation.

see vendor pricing
Yellow.ai Voice
Yellow.ai

Voice AI from Yellow.ai's dynamic automation platform for enterprise CX.

see vendor pricing
Avaamo
Avaamo

Conversational AI platform for healthcare, banking, and enterprise voice + chat.

see vendor pricing
Druid Voice
Druid AI

Voice agent capability layered on the Druid AI conversational automation platform.

see vendor pricing
Inya.ai
Gnani.ai

Generative voice AI agents from Gnani.ai for enterprise CX.

see vendor pricing
SmartAction
SmartAction

AI-powered voice and chat agents for enterprise contact centers.

see vendor pricing
Aimee.ai
Aimee AI

Voice AI assistants for inbound and outbound business calls.

see vendor pricing
Toma
Toma

AI voice agents for auto dealerships and service-based businesses.

see vendor pricing
Helium AI
Helium

Voice AI agents for customer service and sales workflows.

see vendor pricing
MagicAI
MagicAI

AI agents including voice, chat, and content automation in one platform.

see vendor pricing
Charlie
11x.ai

AI sales development representative making outbound voice calls.

see vendor pricing
Sayhi
Sayhi

AI voice agents for customer support and lead conversion.

see vendor pricing
VoiceGenAI
VoiceGenAI

Generative voice AI platform for outbound and inbound business calls.

see vendor pricing
Pragmatic Voice
Pragmatic Voice

Voice AI agents and contact-center automation.

see vendor pricing
Echo AI
Echo AI

Conversation intelligence and voice-AI insights for customer-experience teams.

see vendor pricing
Curious Thing
Curious Thing

Voice AI for inbound and outbound customer engagement.

see vendor pricing
Spitch Voice
Spitch

Speech analytics, biometrics, and voice bots for European enterprises.

see vendor pricing
AnswerForce Voice AI
AnswerForce

Hybrid AI plus human virtual receptionists for small businesses.

see vendor pricing
Goodcall
Goodcall

AI phone agents for service-based small businesses.

see vendor pricing
Daily Bots
Daily.co

Hosted RTVI bots — Daily.co's managed runtime for Pipecat voice agents.

see vendor pricing
JustCall AI
JustCall

Sales-focused cloud phone with AI transcription, coaching, and agent assist.

see vendor pricing
CloudTalk AI
CloudTalk

AI features for CloudTalk's cloud-based call center software.

see vendor pricing
RingCentral RingSense (Voice)
RingCentral

AI conversation intelligence across RingCentral's UCaaS and CCaaS platform.

see vendor pricing
Rasa Pro Voice
Rasa

Commercial Rasa offering with CALM dialog and enterprise voice connectors.

see vendor pricing
Five9 Intelligent Virtual Agent
Five9

Five9's contact-center voice and chat virtual agent product.

see vendor pricing
NICE CXone Mpower
NICE

NICE's AI-orchestrated CX platform with voice virtual agents and Enlighten AI.

see vendor pricing
Genesys Cloud Voice AI
Genesys

Genesys Cloud's voicebot, agent assist, and AI experience orchestration.

see vendor pricing
Amazon Connect + Amazon Q in Connect
Amazon Web Services

AWS Amazon Connect contact center with Q-powered agent assist and bots.

see vendor pricing
Interactions LLC
Interactions LLC

Managed conversational AI for enterprise voice and chat.

see vendor pricing
Uniphore Voice AI
Uniphore

Conversational AI and automation across contact-center voice and chat.

see vendor pricing
Openstream EVA
Openstream.ai

Multimodal conversational AI platform for enterprise voice and chat.

see vendor pricing
Tovie AI
Tovie AI

Conversational AI platform with voice and chat for enterprises.

see vendor pricing
VoiceOwl.ai
VoiceOwl

Generative AI voice agents for B2B sales and marketing.

see vendor pricing
Rosie
Rosie.ai

AI answering service and virtual receptionist for small businesses.

see vendor pricing
Deepconverse
Deepconverse

AI customer-service agents across voice, chat, and self-service.

see vendor pricing
Ada Voice
Ada

Ada's brand interaction platform extended to voice channels.

see vendor pricing
Forethought Voice
Forethought

Generative AI for customer support with voice and chat channels.

see vendor pricing
Convore
Convore

AI conversation agents for B2B revenue teams.

see vendor pricing
Voxalize
Voxalize

Voice AI agents tuned for non-English markets.

see vendor pricing
Swyx Voice (Cognigy AI Copilot)
Cognigy

Generative agent-assist layer for human contact-center agents.

see vendor pricing
Talkdesk Autopilot
Talkdesk

Talkdesk's generative AI voice-and-chat virtual agent for contact centers.

see vendor pricing
Thoughtful AI
Thoughtful Automation

Healthcare-focused AI agents for revenue-cycle and patient communications.

see vendor pricing
Infinitus Systems
Infinitus

Voice AI agents that handle benefits-verification and prior-auth calls in healthcare.

see vendor pricing
Kasisto KAI
Kasisto

Conversational AI platform purpose-built for banking and wealth management.

see vendor pricing
Rulai (Conversica family)
Conversica

Enterprise voice and chat AI agents from Conversica.

see vendor pricing
Salesforce Einstein Service Agent
Salesforce

Salesforce's autonomous AI agent for service, with voice and chat channels.

see vendor pricing
HubSpot Breeze AI (Voice surface)
HubSpot

HubSpot Breeze customer agents extending into voice channels.

see vendor pricing
AI-Media LEXI
AI-Media

Automatic live captioning for broadcast, news and live events — flagship of AI-Media's LEXI family.

enterprise · contact sales
AI-Media LEXI Local
AI-Media

On-prem variant of LEXI for stations that cannot send audio to the cloud.

enterprise · contact sales
AI-Media LEXI Tool
AI-Media

Web-based agent for live event captioning operators, paired with the LEXI engine.

enterprise · contact sales
AI-Media iCap Translate
AI-Media

Real-time caption translation overlay for the iCap broadcast caption network.

enterprise · contact sales
AI-Media iCap
AI-Media

Cloud caption delivery network for live broadcast, the transport layer behind LEXI.

enterprise · contact sales
AI-Media Alta
AI-Media

IP caption encoder for SMPTE 2110, NDI and SRT broadcast facilities.

enterprise · contact sales
EEG Falcon
AI-Media (EEG Enterprises)

Server-based automatic captioning appliance — the predecessor product line that became LEXI.

enterprise · contact sales
EEG enCaption
AI-Media (EEG Enterprises)

Self-contained automatic caption appliance for live linear TV — first-gen broadcast ASR.

enterprise · contact sales
EEG HD492 iCap Encoder
AI-Media (EEG Enterprises)

Classic SDI caption encoder used across thousands of US TV master controls.

enterprise · contact sales
EEG Lexi DR (Direct Recall)
AI-Media

Glossary-aware automatic captioning trained on a station's own proper-noun list.

enterprise · contact sales
ENCO enCaption
ENCO Systems

Automatic caption appliance for radio and TV — a different enCaption from EEG's.

enterprise · contact sales
Telestream Vantage Timed Text
Telestream

Server-side caption automation inside the Vantage media-processing platform.

enterprise · contact sales
1CapApp
1CapApp

Caption-delivery platform used by independent live captioners to bill, deliver and embed captions.

subscription · contact sales
Ava
Ava

Live captioning app for in-person meetings and small events, built for Deaf and hard-of-hearing users.

freemium · paid plans contact sales
Captionsmart
Captionsmart

AI live captioning for SaaS meetings and webinars with a focus on enterprise accessibility.

subscription · contact sales
Cielo24 Compliance (ECC)
Cielo24

Accessibility-compliance tier of Cielo24, aimed at ADA / Section 508 / WCAG 2.1 deliverables.

service · contact sales
OOONA
OOONA

Cloud subtitle authoring and project-management suite used by media-localisation vendors.

subscription · contact sales
Limecraft Flow
Limecraft

Cloud media-logistics platform with built-in transcription and subtitle authoring.

subscription · contact sales
MediaSilo Captions
Shift Media (MediaSilo)

Caption-review and proof functionality bolted onto the MediaSilo review-and-approval platform.

subscription · contact sales
ZOO Subs
ZOO Digital

Cloud subtitle authoring suite from ZOO Digital, used across Hollywood OTT delivery.

enterprise · contact sales
Screen Polistream
Screen Subtitling Systems

Subtitle playout server for live and file-based broadcast distribution.

enterprise · contact sales
ENCO Comprompter
ENCO Systems

Newsroom prompter/script system that frequently feeds the captioning pipeline.

enterprise · contact sales
Inscriber CG (Ross Inscriber)
Ross Video

Broadcast character generator with caption overlay support — legacy of Inscriber Technology.

enterprise · contact sales
Switchboard Live Captions
Switchboard Live

Live captioning add-on for the Switchboard Live multistreaming platform.

subscription · contact sales
BoxCast Live Captions
BoxCast

Automatic captions for the BoxCast live-streaming platform, popular with churches and schools.

subscription · contact sales
Vbrick AI Captions
Vbrick

Enterprise video platform with built-in AI captioning, popular for internal communications.

enterprise · contact sales
Kaltura Live Captions
Kaltura

Caption support inside Kaltura's video platform for higher-ed and enterprise.

enterprise · contact sales
Brightcove Auto Captions
Brightcove

Automatic captioning service inside Brightcove's Video Cloud platform.

enterprise · contact sales
JW Player Captions
JW Player

Caption pipeline inside JW Player's video platform with optional ASR.

subscription · contact sales
Cincopa AI Captions
Cincopa

Auto-caption feature inside the Cincopa video-hosting platform.

subscription · contact sales
IBM Watson Captioning
IBM

Legacy IBM-branded captioning offering on top of Watson Speech to Text.

enterprise · contact sales
US Courts Audio/Video Conferencing System
Administrative Office of the US Courts

Federal courts' AV system that increasingly bolts on AI captions for hearings.

government internal
JAVS
JAVS

Court audio/video recording platform increasingly paired with AI transcription.

enterprise · contact sales
For The Record (FTR)
For The Record

Court recording platform widely deployed across US, UK and Australian court systems.

enterprise · contact sales
Sorenson NXG (Caption Relay)
Sorenson

Sorenson's next-generation captioned-call and relay-services platform.

free for qualified US users (FCC TRS Fund)
accessiBe (audio/video angle)
accessiBe

WCAG overlay platform whose video module wraps third-party caption ASR.

subscription · contact sales
AudioEye (captioning angle)
AudioEye

Accessibility platform that integrates captioning and transcript services.

subscription · contact sales
UserWay (captioning angle)
UserWay

Accessibility widget vendor with optional caption and transcript services.

subscription · contact sales
AI-Media Smart Lexi
AI-Media

LEXI variant tuned for enterprise events that prioritises high precision over speed.

enterprise · contact sales
AI-Media LEXI Toolkit
AI-Media

Glossary, profile and workflow management portal for LEXI customers.

enterprise · contact sales
Daily Live Captions
Daily

Built-in live captioning on the Daily video API platform.

subscription · contact sales
Microsoft Translator Presenter mode
Microsoft

Browser-based live caption + translation overlay for in-person presentations.

free · enterprise via Microsoft 365
Microsoft Teams Live Captions
Microsoft

Built-in live captioning inside Microsoft Teams meetings and live events.

bundled · Microsoft 365 plans
Google Meet Live Captions
Google

Built-in live captions inside Google Meet, used widely for accessibility in education.

bundled · Google Workspace plans
Webex Live Captions
Cisco

Live caption and translation features inside Cisco Webex Meetings.

bundled · Webex plans
Zoom Live Transcription
Zoom

Built-in Zoom live captioning across meetings and webinars.

bundled · Zoom plans
YouTube Live Auto Captions
YouTube

Auto-generated live captions inside YouTube Live streams.

free with YouTube Live
Facebook Live Captions
Meta

Auto-captioning support for live broadcasts on Facebook.

free with Facebook Live
LinkedIn Live Captions
LinkedIn

Auto-captioning for LinkedIn Live and LinkedIn Events.

free with LinkedIn Live
StreamYard Captions
StreamYard (Hopin / Bending Spoons)

Live captioning add-on inside the StreamYard browser studio.

subscription · contact sales
Restream Captions
Restream

Live captioning feature inside the Restream multistream and studio product.

subscription · contact sales
Vimeo Live Captions
Vimeo

Auto and manual captioning inside Vimeo's Live and on-demand video products.

subscription · contact sales
Wowza Captions
Wowza

Caption support inside Wowza's streaming-server ecosystem.

subscription · contact sales
Haivision Captions
Haivision

Captioning passthrough on Haivision's broadcast and enterprise video products.

enterprise · contact sales
Interact-Streamer
Interact-AS

Live caption viewer used by interpreters and CART writers in European events.

subscription · contact sales
Text-on-Top
Text-on-Top

Open-source live caption viewer used by Deaf communities and EU accessibility groups.

free / donation
HeyGen Video Translate
HeyGen

Translate a talking-head video into 175+ languages with lip-sync rebuild from the original speaker.

freemium
Camb.ai (MARS)
Camb.ai

Speech-to-speech translation in 140+ languages with voice cloning — pitched at live sports + media.

paid
Speak.AI Dubbing
Speak Ai Inc.

Multilingual dubbing layered on top of Speak.AI's transcription + qualitative-analysis workflow.

paid
Wavel
Wavel AI

Subtitle, dubbing, and voice-over in 70+ languages — pitched at marketing teams localizing video ads.

freemium
Synthesys Translate
Synthesys

AI dubbing and multilingual voice-over from Synthesys — overlay on their avatar + TTS stack.

paid
Papercup
Papercup

Human-in-the-loop AI dubbing pitched at premium media — Sky News, BBC Studios, Bloomberg.

enterprise
Speechify Video Translate
Speechify

Speechify's dubbing add-on — translate video into 20+ languages keeping the original voice.

freemium
BlipCut
iMyFone (BlipCut)

Cloud video translation with lip sync, voice cloning, and subtitle export.

freemium
InVideo Translate
InVideo

InVideo's video-translation surface on top of its template-based video editor.

freemium
Vidqu
Vidqu

Video translator and AI dubbing tool with a freemium tier.

freemium
Translate.video
Translate.video

Translate, dub, transcribe, and subtitle videos in 75+ languages — single web workflow.

freemium
Words.io (Translated)
Translated

Human + AI translation marketplace from Translated — voice-over and dubbing add-ons.

paid
Dubverse
Dubverse

India-built AI dubbing platform — Indic-language strength.

paid
AI-Media Translate
AI-Media

Captioning + translation services for broadcast and enterprise — Lexi family extension.

enterprise
3Play Translate
3Play Media

Subtitle translation as a managed service from 3Play Media.

paid
Languify
Languify

Speech feedback + multilingual coaching app — pitched at interview / pitch prep.

freemium
Play.ai (Voice Agents)
Play (Play.ai)

Conversational voice agents built on PlayHT's voice stack — real-time + multilingual.

paid
Resemble AI
Resemble AI

AI voice cloning, multilingual TTS, and real-time voice conversion for media + games.

paid
Listnr
Listnr AI

AI text-to-speech and podcast distribution — 142 languages, voice-cloning API.

freemium
Lovo (Genny)
Lovo AI

AI voice generator + video editor pitched at marketing teams — 100+ languages, 500+ voices.

freemium
Voxxos
Voxxos

Multilingual AI voice and dubbing platform — smaller-shop alternative to ElevenLabs.

paid
NaturalReader
Natural Reader Inc.

TTS reader for documents, ebooks, and PDFs — long-running with classroom + accessibility deployments.

freemium
FakeYou
FakeYou

Community-trained character voices for parody and fan-fiction — Tortoise + custom models.

freemium
Microsoft Translator (Live)
Microsoft

Cross-device live conversation translation — share a room code, every device speaks its own language.

free
Google Translate (Conversation)
Google

Two-language conversation mode in Google Translate — phone-on-the-table interpreter.

free
iTranslate Voice
iTranslate (Sonico)

Phone-to-phone voice translator — Apple Watch + AirPods integrations.

freemium
SayHi Translate
Amazon (SayHi)

Voice-first live translation app — Amazon-owned since 2018.

free
Vocre
myLanguage

Side-by-side voice translation app with a kid-friendly UI.

freemium
TripLingo
TripLingo

Business-travel-focused live translator with cultural-cue overlays.

paid
Smartling (Voice + Multimedia)
Smartling

Enterprise translation-management platform with a multimedia + voice-over add-on.

enterprise
Phrase (Multimedia)
Phrase

Phrase TMS / Strings with a multimedia localization module.

enterprise
Lokalise (AV Localization)
Lokalise

Translation-management platform with an AV localization integration layer.

paid
Crowdin (Audio + Video)
Crowdin

Crowdsourced localization platform with audio + video file support.

freemium
Mediasite
Sonic Foundry

Lecture-capture pioneer (Sonic Foundry) used in higher-ed, healthcare, and corporate training.

contact-sales
YuJa Enterprise Video Platform
YuJa

Lecture-capture, live-streaming, and video CMS with auto-captioning across 200+ universities.

contact-sales
YuJa Verity
YuJa

Accessibility + captioning add-on for YuJa's video platform.

contact-sales
BoxCast
BoxCast

Live-streaming platform for houses of worship, schools, and athletics with auto-captioning.

subscription
Verbit (Higher Ed)
Verbit

AI + human transcription service marketed to universities for accessibility-grade captions.

contact-sales
Vbrick Rev
Vbrick

Enterprise video platform with transcription used by Fortune 500 L&D and government.

contact-sales
Brightcove for Education
Brightcove

Brightcove's enterprise video cloud configured for university and L&D delivery.

contact-sales
Wowza Streaming Cloud
Wowza Media Systems

Real-time live-stream infrastructure used under many education-video platforms.

subscription
Cattura
Cattura Video Solutions

Lecture-capture appliance vendor for universities and government, with caption pipeline.

contact-sales
Stream.Live
Stream.Live

Live-stream studio used by educators, conferences, and faith organizations.

subscription
BombBomb
BombBomb

Async video messaging platform used heavily in corporate training and customer education.

subscription
Canvas Studio
Instructure

Instructure's video platform inside Canvas LMS with auto-captioning and inline comments.

contact-sales
Brightspace + ReadSpeaker
D2L / ReadSpeaker

D2L Brightspace LMS with ReadSpeaker text-to-speech and captioning integration.

contact-sales
Blackboard Ally
Anthology

Accessibility platform from Anthology that auto-captions and audio-describes course materials.

contact-sales
Moodle (AI Caption plugins)
Moodle HQ + community

Open-source LMS with community plugins for ASR captioning and AI transcription.

free
Sakai (Auto-caption integrations)
Apereo Foundation

Open-source higher-ed LMS with caption integrations via Kaltura and partner ASR.

free
Schoology Learning
PowerSchool

K-12-focused LMS from PowerSchool with caption-friendly media tools.

contact-sales
itslearning
itslearning

European K-12 + higher-ed LMS with built-in media and caption support.

contact-sales
Anthology Ally
Anthology

Renamed Blackboard Ally, the accessibility + caption layer across Anthology's LMS portfolio.

contact-sales
Canvas Captions (LTI ecosystem)
Instructure (LTI partners)

Caption tools from the Canvas LTI partner ecosystem (3Play, Verbit, Cielo24, AI-Media).

contact-sales
Coursera Captions
Coursera

Coursera's in-platform machine + community captions across 50+ languages for MOOC video.

free
Udemy Auto-Captions
Udemy

Udemy's auto-caption pipeline for instructor-uploaded course video.

free
edX Captions
2U / edX

edX Open edX platform with caption tracks attached to every course video.

free
FutureLearn Captions
FutureLearn

FutureLearn's MOOC platform with auto + reviewed captions for university partners.

free
Khan Academy Captions
Khan Academy

Khan Academy's community + machine caption pipeline across 50+ languages.

free
Pluralsight Captions
Pluralsight

Pluralsight's caption + transcript layer on technology training video.

subscription
LinkedIn Learning Captions
LinkedIn

Captions and synchronized transcripts on LinkedIn Learning's professional video library.

subscription
Udacity Captions
Udacity

Caption tracks across Udacity Nanodegree video content.

subscription
MasterClass Captions
MasterClass

Captions + downloadable transcripts on MasterClass celebrity-led courses.

subscription
Vyond
Vyond

Animated training video tool used by corporate L&D, with caption support on output.

subscription
Synthesia for L&D
Synthesia

AI avatar video platform widely used for corporate training, with multi-language captioning.

subscription
Vimeo Enterprise Captions
Vimeo

Vimeo Enterprise's caption + transcript layer for corporate video portals.

contact-sales
Knowmore
Knowmore

Knowledge management + training video platform with auto-transcription.

contact-sales
WorkRamp
WorkRamp

Corporate LMS used for sales-enablement, onboarding, and customer training.

contact-sales
360Learning
360Learning

Collaborative LMS focused on internal trainers, with caption support on uploaded video.

contact-sales
Docebo
Docebo

Enterprise LMS used by mid-to-large companies, with caption + AI content tools.

contact-sales
Dovetail
Dovetail

User-research repository with auto-transcription used by UX teams.

subscription
Reduct.video Research
Reduct.video

Transcript-based video editor positioned for UX and qualitative research teams.

subscription
Trint Research
Trint

Trint's enterprise transcription positioned for research and academic teams.

subscription
Rev for Researchers
Rev.com

Rev's transcription + caption services positioned for academic and market research.

subscription
Khanmigo
Khan Academy

Khan Academy's AI tutor that listens to learners and provides voice-based feedback.

subscription
Duolingo Max (Roleplay)
Duolingo

Duolingo's premium tier with AI Roleplay voice conversations and Explain My Answer.

subscription
Speak
Speakeasy Labs

AI English-tutor app backed by OpenAI's Startup Fund, focused on speaking practice.

subscription
Praktika
Praktika.ai

AI English tutor with avatar-based speaking practice across phone and web.

subscription
Talkpal AI
Talkpal

Multi-language AI language tutor with voice conversations.

subscription
Loora AI
Loora AI

AI English tutor with phone-call style practice and accent feedback.

subscription
Voxy
Voxy

Enterprise English-language training platform used by global corporations.

contact-sales
Lingmo
Lingmo International

Real-time translation + transcription used by enterprises and educators.

contact-sales
ELSA Speak
ELSA Corp

AI pronunciation coach for English learners, with on-device speech scoring.

subscription
Verbit Higher Ed
Verbit

Verbit's premium captioning + transcription program targeted at universities.

contact-sales
3Play Media Higher Ed
3Play Media

3Play Media's caption services positioned for university accessibility offices.

contact-sales
CaptionAccess
CaptionAccess

Live and post-production captioning service for higher-ed and conferences.

contact-sales
Rev for Education
Rev.com

Rev.com's education program for universities and K-12 districts.

subscription
AI-Media Education
AI-Media

AI-Media's iCap and LEXI live-caption services positioned for higher-ed.

contact-sales
FutureLearn Trainer
FutureLearn

FutureLearn's corporate-training arm with captioned course content for enterprise buyers.

contact-sales
Skillsoft Percipio
Skillsoft

Enterprise learning platform with captions and synchronized transcripts.

contact-sales
Cornerstone Content Anytime
Cornerstone OnDemand

Cornerstone's training-content subscription with captioned video and analytics.

contact-sales
Absorb LMS
Absorb Software

Mid-market LMS with caption support across courses and integrated video tools.

contact-sales
LearnWorlds
LearnWorlds

Course-creator platform with interactive video and auto-caption support.

subscription
Thinkific
Thinkific

Course-creator platform with caption support on course video.

subscription
Teachable
Teachable

Course-creator platform with SRT-based captioning workflow.

subscription
Kajabi
Kajabi

All-in-one creator platform with caption-supported video hosting.

subscription
Intellum
Intellum

Enterprise customer-education LMS with captioned video and analytics.

contact-sales
Skilljar
Skilljar

Customer-education LMS used by B2B SaaS, with caption-ready video and SCORM support.

contact-sales
Litmos
Litmos (Francisco Partners)

Mid-market corporate LMS with caption-ready video and SCORM authoring.

contact-sales
CYPHER Learning
CYPHER Learning

AI-driven LMS for K-12, higher-ed, and business, with captioned content.

contact-sales
Instructure Canvas LMS
Instructure

The dominant North American higher-ed LMS, with caption hooks across the partner ecosystem.

contact-sales
Blackboard Learn Ultra
Anthology

Anthology's flagship higher-ed LMS, with Anthology Ally as the caption layer.

contact-sales
D2L Brightspace
D2L

Canadian-rooted higher-ed and K-12 LMS with broad caption-vendor integrations.

contact-sales
TalentLMS
Epignosis

SMB-focused corporate LMS with captioned video and SCORM/xAPI support.

subscription
EdApp by SafetyCulture
SafetyCulture

Microlearning platform with captioned mobile-first lessons.

subscription
Axonify
Axonify

Frontline corporate training platform with captioned daily microlearning.

contact-sales
Saba Cloud
Cornerstone OnDemand

Cornerstone Saba enterprise LMS with captioned video and SCORM compliance.

contact-sales
Totara Learn
Totara Learning

Moodle-derived enterprise LMS used in regulated industries, with caption support.

contact-sales
Valamis
Valamis

Learning experience platform with captioned video and integrated analytics.

contact-sales
Degreed
Degreed

Learning experience platform aggregating courses with captioned source video.

contact-sales
EdCast (Cornerstone Xplor)
Cornerstone OnDemand

Cornerstone's LXP with captioned video and AI skill mapping.

contact-sales
OpenLearning
OpenLearning

Australian MOOC + microcredential platform with captioned course video.

contact-sales
SWAYAM
Government of India (MoE)

India's national MOOC platform with captioned video across school and college courses.

free
NPTEL
IIT Madras / Government of India

IIT-led video lecture archive with subtitles across thousands of engineering courses.

free
Duolingo English Test
Duolingo

Voice-and-language proficiency test using on-device ASR for speaking sections.

subscription
Replika (Voice)
Luka

AI companion app with real-time voice conversations.

subscription
Lectio AI
Lectio AI

AI lecture-notes assistant that transcribes class audio and generates study guides.

subscription
Fathom for Education
Fathom

Fathom's free Zoom transcription tool used by educators and student organizations.

free
Minerva Forum
Minerva Project

Active-learning platform from Minerva University with transcript-driven analytics.

contact-sales
Explain Everything
Promethean

Interactive whiteboard for educators with voice-narrated lessons.

subscription
Edpuzzle
Edpuzzle

Video-lesson platform with caption support and embedded quizzes.

subscription
Nearpod
Renaissance Learning

Interactive K-12 lesson platform with captioned video and audio responses.

subscription
Swivl
Swivl

Classroom-recording hardware + Reflectivity coaching app with auto-transcription.

subscription
Web Captioner
Curt Grimes

Free in-browser real-time captioning powered by your browser's speech recognition.

free
Chrome Live Caption
Google

Built-in Chrome feature that captions any audio playing in the browser, on-device.

free
Voice In Voice Typing
Dictanote

Chrome extension that lets you dictate into any text field on any site.

free / Pro from $39/yr
Speech Recognition Anywhere
TalkTyper

Chrome extension with custom voice commands across any site.

free / Pro one-time
TalkTyper
TalkTyper.com

Bookmarkable web page that converts your speech to text in the browser.

free
Speechnotes (web)
TTSReader Ltd

Free voice notepad with Google Speech accuracy and a Chrome extension counterpart.

free / Premium
SpeechTexter (web)
SpeechTexter

Free in-browser dictation with custom voice commands and 70+ languages.

free
Transcribe by Wreally (web)
Wreally

Browser transcription editor combining auto-transcription with foot-pedal-style controls.

from $20/yr (sub)
oTranscribe (web)
Elliot Bentley

Free open-source browser transcription pad — no upload, runs locally.

free
VoiceNote II
Vito Galante

Chrome app for taking voice notes via the browser's speech recognizer.

free
Dictanote
Dictanote

Note-taking app with built-in dictation across web, iOS, and Android.

free / Pro from $39/yr
TalkPad
TalkPad

Free in-browser dictation pad with autosave and quick share.

free
Talkie Dictation
Talkie

Chrome extension that speaks back what it transcribed for proofing.

free
NaturalReader (Chrome extension)
NaturalSoft Ltd

TTS plus dictation companion for the NaturalReader reading platform.

free / Premium from $9.99/mo
Read&Write by Texthelp
Texthelp

Literacy support toolbar with TTS, dictation, and word prediction across browsers.

free trial / Premium (institution-licensed)
Co:Writer Universal
Don Johnston Inc.

Word-prediction and dictation extension for struggling writers.

license-based (school)
ClaroRead Chrome
Claro Software

Accessibility extension with TTS, dictation, screen masking, and color overlays.

free trial / Premium
Google Docs Voice Typing
Google

Built-in dictation inside Google Docs, accessed under Tools > Voice typing.

free (Google account)
Microsoft 365 Dictate (web)
Microsoft

Dictation built into Word, Outlook, OneNote in the browser.

included (Microsoft 365 sub)
Microsoft Word Transcribe
Microsoft

Upload audio and get speaker-attributed transcripts inside Word on the web.

included (Microsoft 365 sub, 5 hours/mo)
Apple Dictation (Safari)
Apple

System-wide dictation that works in every text field across macOS, iOS, and Safari.

free (Apple device)
Lex.page (voice)
Every (Lex)

AI writing tool with voice prompts and dictation in the browser.

free / Pro
Grammarly (voice typing)
Grammarly

Grammarly's browser extension pipes dictation into any text field with grammar checks.

free / Premium
AssemblyAI Playground
AssemblyAI

In-browser playground for AssemblyAI's Universal speech model.

free (signed-in) / API billed
Deepgram Playground
Deepgram

Browser playground for testing Deepgram Nova STT and Aura TTS.

free / API billed
OpenAI Whisper Playground
OpenAI

OpenAI's docs playground with file-upload and translation against the Whisper API.

free (signed-in) / API billed
HuggingFace ASR Spaces
Hugging Face

Hundreds of community-hosted browser demos of speech recognition models.

free (community-hosted)
Modal Whisper Demo
Modal Labs

Hosted browser demo of Whisper running on Modal's serverless GPUs.

free demo / Modal compute billed
Replicate Whisper (web demo)
Replicate

In-browser playground for Whisper variants hosted on Replicate.

free demo / Replicate compute billed
RunPod Whisper template
RunPod

One-click Whisper endpoints from a browser dashboard on RunPod GPUs.

GPU billed by minute
Sieve ASR demo
Sieve

Browser playground for Sieve's chained-video ASR pipelines.

free credits / pay-per-run
Whisper Web (Xenova)
Xenova / Hugging Face

Whisper running fully in the browser via Transformers.js — no server.

free
Whisper WebGPU
Xenova / Hugging Face

GPU-accelerated Whisper inference in the browser via WebGPU.

free
Whisper Turbo (web)
Fleek HQ / community

Faster on-device Whisper in the browser using a quantized turbo build.

free
Live Transcribe (web)
Google

Browser version of Google's accessibility live-transcription app.

free
Caption.TV
Caption.tv

Browser overlay that captions any streaming video tab in real time.

free / Pro
Transcribe Online Free (screen audio)
TranscribeOnline.com

No-install browser site that captures tab audio and transcribes it.

free / Premium
SubtitleBee Live (web)
SubtitleBee

Browser-based live captioning paired with the SubtitleBee subtitle generator.

free trial / Premium
Stenogenius
Stenogenius

Browser stenography pad for live captioning by trained CART providers.

per provider license
Speakly Voice Typing
Speakly

Chrome extension that turns voice into typed text in any field.

free
Scribe AI
Scribe AI

Voice typing and clipboard helpers for productivity in Chrome.

free
Speakify
Speakify

Chrome dictation with offline-style behavior using the browser engine.

free
Speech Pulse
Speech Pulse

Free Chrome speech-to-text with sentence segmentation and punctuation.

free
Mighty Voice
Mighty Voice

Chrome dictation extension with custom voice macros.

free / Pro
Vocalmatic (web)
Vocalmatic

Browser audio-to-text service with a free trial transcription credit.

free trial / pay-as-you-go
Audext
Audext

Browser transcription editor with sync-highlight playback.

from $5/hr
Txtify
Txtify

Free browser transcription service with privacy-friendly pricing.

free / Pro
Writeout AI
Writeout

Pay-per-minute Whisper API wrapper running in the browser.

per-minute usage
Live Caption for Meet (community)
various

Community Chrome extensions adding richer captions to Google Meet.

free
Twitch Caption Helpers (community)
various

Community Chrome extensions adding live captions to Twitch streams.

free
YouTube Caption Downloader (community)
various

Chrome extensions that export YouTube auto-captions to SRT/TXT.

free
Rev (Chrome extension)
Rev.com

Rev's Chrome extension for capturing audio from Zoom/Meet/Teams calls.

free trial / Rev sub
Scribie (Chrome)
Scribie

Browser companion for the Scribie human-transcription marketplace.

from $0.10/min (auto) / from $0.80/min (human)
Konch.ai
Konch

Browser transcription with built-in collaboration for journalists.

free trial / Pro
Scribee
Scribee

Quick browser transcription with the OpenAI Whisper backbone.

free / Pro
Good Tape (web)
Zetland (Good Tape)

Browser transcription built by Wired's parent for journalists.

free starter / Pro from $12/mo
Vidby
Vidby

Browser video-translation and transcription service with 100+ languages.

per-minute pricing
FreeSubtitles AI
FreeSubtitles

Free browser subtitle generator using Whisper.

free
Transkriptor (web)
Transkriptor

Browser transcription with Turkish-language strength and team workspace.

free trial / Pro from $9.99/mo
Amberscript (web)
Amberscript

Dutch browser transcription and subtitling platform with human-review option.

free trial / Pro / human-grade tier
Verbit (web)
Verbit

Browser transcription with human-review pipeline aimed at legal and education.

enterprise / per-minute
Interscriber
Interscriber

Browser transcription tool with structured interview templates.

free trial / Pro
Scribepad
Scribepad

Free browser dictation pad with Markdown shortcuts.

free
Speechmatics Realtime Demo
Speechmatics

Browser demo of Speechmatics' real-time engine with live mic capture.

free demo / API billed
ElevenLabs Scribe Demo
ElevenLabs

In-browser demo of ElevenLabs' Scribe transcription model.

free signed-in / API billed
Gladia Playground
Gladia

Browser playground for Gladia's multilingual transcription API.

free credits / API billed
Krisp (Chrome extension)
Krisp

Browser companion for Krisp's noise-cancelling and transcription stack.

free / Pro
noScribe (web companion)
Kai Dröge

GUI for offline transcription on the browser/desktop with diarization.

free
Web Speech API demo (caption this page)
MDN / Mozilla

MDN's reference Web Speech API demo — bookmarkable for SEO research.

free
Soniox Realtime Demo
Soniox

Browser demo of Soniox's low-latency streaming speech engine.

free demo / API billed
Picovoice Web Demo
Picovoice

Browser playground for Picovoice's on-device wake-word and ASR engines.

free / commercial license
BandLab AI Tools
BandLab Technologies

Cloud DAW with one-click Vocal Cleanup, Splitter, Mastering, and Vocal Tuner.

free · paid tier (BandLab Membership)
Soundtrap (Spotify) — transcription
Spotify

Spotify's web DAW with podcast transcription baked into the storyboard editor.

Storyteller subscription $14.99/mo
Audius (creator tools)
Audius

Decentralized audio platform — listed for creator-facing AI tagging/transcription experiments.

free
EditShare FLOW with AI metadata
EditShare

Media asset platform with AI face / speech / scene tagging across ingested footage.

enterprise · contact EditShare for quote
Voloco
Resonant Cavity

AI vocal-processor on iOS / Android + desktop — pitch correction + de-noise for creators.

free · Voloco Pro subscription $7.99/mo
Respeecher (voice-replace for ADR)
Respeecher

Speech-to-speech voice replacement used in film/TV ADR — cloud-side with DAW integrations.

subscription · contact for studio pricing
Promo Submit Master (PSM AI mastering)
Promo Submit Master

AI mastering for podcasts + broadcast — speech-aware loudness profiles.

freemium · per-master pricing
LANDR Mastering
LANDR

Pioneering AI mastering service — speech and music profiles for podcast + music releases.

free trial · subscription $4+/mo (Studio) up to $24/mo (Pro)
ElevenLabs Voice Clone
ElevenLabs

Studio-grade Instant + Professional Voice Cloning with multilingual output (33+ languages).

freemium — Starter $5/mo, Creator $22/mo, Pro $99/mo, Scale $330/mo (Professional Voice Clone requires paid)
PlayHT Voice Clone
PlayHT

Ultra-realistic instant + high-fidelity voice cloning with 142+ languages and 800+ stock voices.

freemium — Creator $39/mo, Unlimited $99/mo (Instant Clone on free, High-Fidelity Clone on paid)
Speechify Voice Clone
Speechify

One-take personal voice clone built into Speechify Studio.

freemium — Premium $11.58/mo (Voice Clone gated to Premium+)
WellSaid Labs
WellSaid Labs

Enterprise-licensed AI voices for corporate L&D, training, and marketing narration.

Maker $44/mo, Creator $89/mo, Team $179/seat/mo, Enterprise custom
Speechki
Speechki

Audiobook-grade TTS with audio engineers in the loop.

per-project — $0.045/minute (TTS) + audiobook tiers
BeyondWords
BeyondWords

TTS + podcast pipeline for publishers — articles to audio at scale.

freemium — Standard $19/mo, Premium $49/mo, Enterprise custom
VoiceChanger.io
VoiceChanger.io

Free browser voice changer with effect-based transformations.

free
Replica Studios
Replica Studios

Game voice actors — licensed AI voices with SAG-AFTRA agreement.

Creator $24/mo, Pro $60/mo, Studio $180/mo, Enterprise custom
Acapela Group
Acapela Group

Long-running enterprise TTS — accessibility, AAC, signage, IVR.

Enterprise — contact sales
Cepstral
Cepstral

Long-running embedded TTS — IVR and telephony.

from $29.99 per voice (desktop) / Enterprise contact sales
ReadSpeaker
ReadSpeaker

Enterprise TTS for accessibility, e-learning, and document narration.

Enterprise — contact sales
Nuance Vocalizer
Microsoft (Nuance)

Nuance (Microsoft) enterprise TTS — IVR and contact centers.

Enterprise — contact sales
NeoSpeech
NeoSpeech

Embedded TTS engine for IVR and consumer products.

Enterprise — contact sales
Cliptalk
Cliptalk

AI voice generator for short-form social videos.

freemium
Camb.ai
Camb.ai

MARS-7 multilingual TTS + dubbing platform with low-resource language support.

freemium — Pro $19/mo, Studio $79/mo, Enterprise custom
Altered Studio
Altered AI

AI voice generation for game audio and dubbing with speech-to-speech.

Creator $4.99/mo, Pro $35.99/mo, Studio custom
Audible Maven Beta
Amazon (Audible / ACX)

Amazon's AI-narrated audiobook tier with self-publishing pipeline.

free to ACX publishers (Beta)
Apple Books Digital Narration
Apple

Apple's AI-narrated audiobook tier for indie publishers.

free to participating publishers
Findaway Voices by Spotify
Spotify (Findaway)

Audiobook self-publishing pipeline with a 2024 AI-narration tier.

free + per-sale royalty split
Murf for Audiobooks
Murf AI

Murf's audiobook-specific workflow with chapter markers and ACX-compatible export.

Creator $29/mo, Business $99/mo
Lovo for Audiobooks
Lovo AI

Lovo's audiobook workflow with multi-character scripts and emotion control.

Basic $24/mo, Pro $48/mo, Pro+ $149/mo
BeyondWords for Audiobooks
BeyondWords

BeyondWords' audiobook narration tier — TTS plus distribution.

Premium $49/mo, Enterprise custom
NaturalReader Accessibility
NaturalReader

NaturalReader's enterprise tier for ADA / Section 508 web accessibility overlays.

Enterprise — contact sales
ReadSpeaker docReader
ReadSpeaker

Document-narration plugin for accessible PDFs and Office files.

Enterprise — contact sales
ClaroRead
Claro Software

Assistive-tech TTS reader for dyslexic and visually-impaired users.

Standard £79+, Plus £150+, site licenses
Voiceflow Voices
Voiceflow

Voiceflow's TTS for prototyping voice assistants and chatbots.

freemium — Pro $50/mo, Teams $125/mo, Enterprise custom
Uberduck
Uberduck

AI voice and music generation with a focus on rap and music vocals.

freemium — Creator $9.99/mo, Pro $24.99/mo
DeepZen
DeepZen

Emotional audiobook narration with licensed voice IP.

Enterprise — contact sales
Respeecher
Respeecher

Speech-to-speech voice cloning for film, gaming, and broadcast.

Enterprise — contact sales
Veritone Voice
Veritone

Enterprise voice cloning with licensed-talent marketplace.

Enterprise — contact sales
Voicery
Apple (acquired Voicery)

Neural TTS API — acquired by Apple in 2020, archived as a public product.

discontinued
Sonantic
Spotify (acquired Sonantic)

Emotional voice synthesis for games — acquired by Spotify in 2022.

discontinued as standalone
TTS Monster
TTS Monster

Twitch / streamer TTS donation tool with celebrity voice characters.

freemium — Pro $4.99/mo, Streamer $9.99/mo
Amazon Alexa
Amazon

Amazon's cloud voice assistant, powering Echo devices and Alexa-enabled hardware.

Free with compatible hardware
Alexa+
Amazon

Amazon's LLM-powered upgrade to Alexa with a more conversational ASR + reasoning stack.

$19.99/mo · free for Prime
Google Assistant
Google

Google's voice assistant for Android, Nest speakers, and Wear OS.

Free with Google account
Gemini Live
Google

Google's conversational Gemini voice mode — successor surface for Assistant on Android.

Free tier · Gemini Advanced $19.99/mo
Microsoft Copilot Voice
Microsoft

The voice mode of Microsoft Copilot — Cortana's successor in the Microsoft assistant lineage.

Free · Copilot Pro $20/mo
Samsung Bixby
Samsung

Samsung's voice assistant across Galaxy phones, TVs, and home appliances.

Free with Samsung hardware
Xiaomi XiaoAI
Xiaomi

Xiaomi's Mandarin-first voice assistant across Mi phones, speakers, and home IoT.

Free with Xiaomi hardware
Huawei Celia
Huawei

Huawei's voice assistant for HarmonyOS phones, tablets, and smart-home devices.

Free with Huawei hardware
OPPO Breeno
OPPO

OPPO's voice assistant for ColorOS phones in China.

Free with OPPO hardware
Vivo Jovi
Vivo

Vivo's voice assistant for FuntouchOS / OriginOS phones in China and India.

Free with Vivo hardware
Yandex Alice
Yandex

Yandex's Russian-language voice assistant for Stantsiya speakers, cars, and phones.

Free with Yandex account
Sber Salyut
Sberbank / SberDevices

Sberbank's Russian voice-assistant family (Salyut, Joy, Athena) for SberDevices hardware.

Free with SberDevices hardware
MTS Marvin
MTS

MTS's Russian-language voice assistant for Capsule speakers and the MTS Music app.

Bundled with MTS Capsule speaker (~₽7,000)
VK Marusya
VK

VK's Russian-language voice assistant for Capsule Mini and the VK Music app.

Free with VK account
AliGenie / Tmall Genie
Alibaba

Alibaba's Mandarin voice assistant powering the Tmall Genie smart-speaker line.

From ¥99 device + free service
JD Joy / Dingdong
JD.com

JD.com's Mandarin voice assistant on the Dingdong smart speaker line.

From ¥199 device + free service
Baidu DuerOS / Xiaodu
Baidu

Baidu's Mandarin voice OS powering Xiaodu smart speakers and displays.

From ¥249 device + free service
Meta Ray-Ban Smart Glasses
Meta · EssilorLuxottica

Ray-Ban Meta smart glasses with Meta AI voice — capture photos and ask questions hands-free.

$299 frames + free Meta AI service
Google Live Transcribe
Google

On-device live captioning of in-person conversations — Pixel and many Android phones.

Free with Android device
Even Realities G1
Even Realities

Prescription-friendly smart glasses with monocular captioning HUD and live translation.

$599 frames + free service
Brilliant Labs Frame
Brilliant Labs

Hackable open-source smart glasses with mic + display, runs custom captioning apps.

$349 frames + open SDK
XREAL Air / Air 2 / Beam
XREAL

Display-only AR glasses pairing with phones — captioning apps via the XREAL ecosystem.

From $379 frames
Rokid Max / Glass
Rokid

AR glasses with on-device translation and captioning via the Rokid Station companion.

From $449 frames
INMO Air 2
INMO Technology

Standalone Android-based smart glasses with on-board mic, captioning, and translation.

From $529 frames
TCL RayNeo X2 / Air
TCL

TCL's AR glasses with built-in live translation and captioning HUD.

From $599 frames
ARxVision
ARx Vision

AI-powered glasses for low-vision users — voice description + captioning of environment.

$3,800 device + subscription tier
Olive Smart Ear / Olive Pro
Olive Union

Affordable hearing assistance earbuds with app-tuned amplification.

From $299 device
Eargo
Eargo

Invisible OTC hearing aids tuned by app, with telecare support.

From $1,650 pair
Phonak Roger
Phonak (Sonova)

Wireless mic system for hearing aids — pairs with Roger receivers in noisy rooms.

From $799 per mic + audiologist fitting
Sennheiser Conversation Clear Plus
Sennheiser · Sonova

Speech-enhancement earbuds — focus on speech in noisy environments.

From $849 pair
Nuheara IQbuds Max
Nuheara

Hearing-assist earbuds with adjustable speech focus and ear-tuned DSP.

From $499 pair
Google Live Transcribe + Pixel Buds
Google

Pixel Buds capturing audio + Live Transcribe rendering captions in the Pixel Buds app.

Buds from $229 + free Live Transcribe app
Mercedes-Benz MBUX Voice Assistant
Mercedes-Benz

"Hey Mercedes" — factory-fitted voice assistant in Mercedes vehicles.

Included with vehicle
BMW Intelligent Personal Assistant
BMW

"Hey BMW" — voice assistant in BMW iDrive 7 / 8 / 9 vehicles.

Included with vehicle
Audi MMI Voice
Audi

"Hey Audi" — natural-language voice assistant in Audi MMI infotainment.

Included with vehicle
Volkswagen We Connect Voice / IDA
Volkswagen

"Hello IDA" — voice assistant across the VW ID. and Golf families.

Included with vehicle
Toyota "Hey Toyota" Voice Assistant
Toyota

Toyota Audio Multimedia voice assistant — "Hey Toyota" wake phrase in newer models.

Included with vehicle
Honda Personal Assistant
Honda

"Hey Honda" — voice assistant in Honda e:HEV and e:NS models.

Included with vehicle
Ford SYNC 4
Ford

Ford's in-vehicle infotainment with natural-language voice control.

Included with vehicle
GM Google Built-In Voice
General Motors

Google Assistant native in Chevy / Cadillac / GMC vehicles — no phone needed.

Included with vehicle · 8 yrs included data on EVs
Tesla Voice Commands
Tesla

Tesla's in-cabin voice control for navigation, climate, media, and vehicle settings.

Included with vehicle
Rivian Voice Control
Rivian

Rivian R1T / R1S voice commands — "Hey Rivian" for nav, media, and vehicle features.

Included with vehicle
Lucid Voice Assistant
Lucid Motors

Lucid Air voice control — native voice in Lucid's Glass Cockpit + Pilot Panel.

Included with vehicle
NIO Nomi
NIO

NIO's in-car AI assistant with a physical animated head on the dash.

Included with vehicle
XPENG Xmart OS
XPENG

XPENG's Mandarin voice assistant in Xmart OS — "Hi Xpeng".

Included with vehicle
Hyundai Bluelink
Hyundai

Hyundai's connected-car voice + telematics suite with hands-free vehicle control.

Included with vehicle · subscription after trial
Kia Connect
Kia

Kia's connected-car platform — voice control + remote app + over-the-air services.

Included with vehicle · subscription after trial
Stellantis Uconnect 5 with AI
Stellantis

Uconnect 5 voice assistant across Jeep, Chrysler, Dodge, Ram, Fiat, Peugeot, Citroën.

Included with vehicle · subscription after trial
Cerence Drive
Cerence

Automotive ASR / TTS / dialogue platform — powers most OEM in-car assistants worldwide.

Enterprise · OEM contract
SoundHound Houndify Automotive
SoundHound AI

Independent voice-AI platform — Cerence competitor used by Mercedes, Hyundai, and Stellantis.

Tiered — free dev / commercial license / OEM
Sonos Voice Control
Sonos

Privacy-first on-device voice control for Sonos speakers — "Hey Sonos".

Free with Sonos speakers
Amazon Echo Show
Amazon

Alexa smart display with video calling, recipe view, and on-device voice.

From $109 (Echo Show 5)
Amazon Echo Auto
Amazon

Plug-in Alexa accessory for cars without built-in Alexa.

$54.99 device + Alexa free
Amazon Echo Dot Kids
Amazon

Echo Dot edition with Amazon Kids+ parental controls and a kid-safe Alexa.

$59.99 device + Amazon Kids+ $5.99/mo
Google Nest Hub
Google

Google's smart display with Assistant / Gemini voice and on-device hot-word.

From $99 (Nest Hub) · $229 (Nest Hub Max)
Roku Voice Remote
Roku

Push-to-talk voice search on Roku TVs and streaming sticks.

Included with Roku hardware
Amazon Fire TV Voice Remote
Amazon

Alexa Voice Remote for Fire TV — push-to-talk and hands-free Fire TV variants.

Included with Fire TV hardware
Samsung TV Bixby
Samsung

Bixby voice control built into Samsung Smart TVs and soundbars.

Included with TV hardware
LG ThinQ Voice
LG Electronics

Voice control across LG TVs and ThinQ appliances — "Hi LG".

Included with LG hardware
Whirlpool Smart Appliances Voice
Whirlpool Corporation

Voice control for Whirlpool / Maytag connected appliances via Alexa + Google.

Included with smart appliance + hub
GE Appliances SmartHQ Voice
GE Appliances (Haier)

Voice control for GE connected appliances via SmartHQ + Alexa / Google.

Included with smart appliance
Frigidaire Connected Appliances Voice
Electrolux

Voice control for Frigidaire connected appliances via the Frigidaire app + Alexa / Google.

Included with smart appliance
Picovoice Porcupine
Picovoice

On-device wake-word engine — runs on micro-controllers, mobile, browsers.

Free tier · paid commercial · enterprise
Picovoice Cobra
Picovoice

On-device voice-activity detector — detect speech vs silence in real time.

Free tier · paid commercial
Snips (legacy)
Sonos (Snips acquisition)

Acquired-by-Sonos on-device assistant platform — now lives inside Sonos Voice Control.

Discontinued
Sensory TrulyHandsfree
Sensory

Sensory's on-device wake-word + small-vocab ASR — long-standing OEM voice IP.

Enterprise · OEM contract
Meta Quest Live Captions
Meta

System-level live captions in Meta Quest 2 / 3 / Pro for VR audio.

Free with Meta Quest headset
PICO VR Captions
PICO (ByteDance)

Live captions in PICO 4 / Neo 3 headsets — ByteDance's VR accessibility feature.

Free with PICO headset

Frequently asked

faster-whisper vs whisperX — which should I use?

faster-whisper is the speed-optimised runtime. whisperX adds speaker diarization (pyannote) and forced-alignment word timestamps on top. Use faster-whisper if your audio is single-speaker and you only need the transcript. Use whisperX if the content has multiple speakers and you need "who said what."

What's the cheapest transcription API in 2026?

Per-minute pricing (as of 2026-04-20): Deepgram Nova-2 at $0.0043/min is the cheapest streaming API. OpenAI Whisper API is $0.006/min. Self-hosting faster-whisper on a rented GPU is cheaper at scale but requires operational work. Prices shift — check the linked page.

What's the best open-source Otter.ai alternative?

For file-transcription, whisperX (or faster-whisper with pyannote) gives you the same transcript + speaker-label output Otter produces. For the meeting-bot workflow itself, there's no one-click OSS replacement — you'd need to combine Whisper + a bot framework (e.g. meeting-bot libraries) yourself.

Which is best on Apple Silicon (M-series Macs)?

whisper.cpp with the Metal backend is the fastest pure-CLI option. WhisperKit is the Swift-native choice for in-app integration. MacWhisper is the polished desktop app for non-technical users.

I need HIPAA compliance. Which options qualify?

For commercial APIs with HIPAA/BAA paths: Deepgram, AssemblyAI, Rev.ai, and Speechmatics all offer them on appropriate tiers. For self-hosted, HIPAA is your responsibility — the license doesn't grant compliance; your deployment architecture does.

Whisper says it supports 99 languages. Is that real?

The model weights cover 99 languages, but quality varies widely. English, Spanish, German, French, Japanese, and Chinese are excellent. Low-resource languages (e.g. many African and Southeast-Asian languages) are significantly weaker — often below a usable WER. SeamlessM4T is worth checking for those.

Prefer a hosted service over running your own GPU? Whipscribe runs faster-whisper + whisperX behind a web UI, REST API, and MCP server for Claude Desktop.

Try Whipscribe →