Transcription tools directory
Every audio-to-text tool we track — open-source engines, cloud APIs, meeting notetakers, podcast & video editors, dictation apps, dubbing AI, voice agents, datasets, and more — grouped by category. Search or filter to narrow; click a heading to expand. Curated by Whipscribe; updated 2026-05-15.
Updated 2026-05-15 · 1679 tools trackedOpen source
The reference open-source multilingual ASR model from OpenAI.
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
4× faster than reference Whisper using CTranslate2 — production sweet spot.
Faster-whisper + forced alignment + speaker diarization in one pipeline.
CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.
Whisper with stabilised timestamps — more accurate word-level timing.
Swift Whisper for Apple Silicon — CoreML, ANE, Metal. Now part of the Argmax Open-Source SDK (v1.0.0, May 2026) alongside SpeakerKit + TTSKit.
Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.
Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.
Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.
Cross-platform desktop app for Whisper — open-source MacWhisper alternative.
Open-source TTS model with strong prosody — slow on CPU.
Open-source TTS toolkit with multi-language voice models.
70x faster Whisper on TPUs via JAX + Flax + batching.
Whisper inference on Apple Silicon via Apple's MLX framework.
Idiomatic Rust bindings for whisper.cpp.
Whisper running on Windows via DirectCompute / GPGPU.
Single-EXE Whisper for Windows + Linux, no dependencies.
Command-line Whisper using CTranslate2 — closest match to openai/whisper CLI.
Python bindings for whisper.cpp with a simple iterator API.
Gradio web UI bundling faster-whisper + diarization + translation.
Real-time Whisper transcription over WebSockets.
Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.
WhisperFusion's voice-chat reference app.
Academic real-time Whisper streaming with LocalAgreement-2.
Word-level timestamps for OpenAI Whisper without retraining.
Whisper + NeMo MSDD diarization pipeline.
OpenAI-compatible /v1/audio/transcriptions endpoint over faster-whisper.
Dockerized Whisper REST API with multiple backends.
Optimized batched Whisper engine with VAD + dynamic batching.
Mic-in-browser → real-time Whisper transcription demo.
Always-listening hot-mic Whisper transcriber.
Single-page web UI to generate subtitles via Whisper.
Whisper inverted into a TTS — also used as ASR-aware training data tool.
Easy-to-use speech toolkit: TTS, STT, alignment, language detection.
One-click Whisper + diarization + voice cloning Gradio app.
Whisper retrained for medical / clinical transcription accuracy.
Toolkit + model zoo behind Canary, Parakeet, Conformer, FastConformer.
Meta's multilingual speech-translation + transcription foundation suite.
Meta's seq-to-seq toolkit — home of wav2vec, HuBERT, XLS-R, MMS.
One API for Whisper, Wav2Vec2, HuBERT, XLS-R, SeamlessM4T, Parakeet.
ONNX + TensorRT + OpenVINO acceleration for Transformers ASR models.
Multi-GPU / mixed-precision launcher for any PyTorch ASR training script.
Streaming loader for Common Voice, LibriSpeech, GigaSpeech, FLEURS.
LoRA / adapters for parameter-efficient Whisper fine-tuning.
PyTorch toolkit for ASR, speaker, diarization, enhancement.
End-to-end speech toolkit: ASR, TTS, ST, speaker, separation.
The classic C++ HMM/DNN speech recognition toolkit.
FSA/FST framework written from scratch in PyTorch/CUDA.
ASR recipes (Conformer / Zipformer / Pruned Transducer) for k2 + sherpa.
Production server for k2/icefall + Whisper models (PyTorch).
ONNX-runtime ASR: Whisper, Zipformer, Paraformer on every platform.
ASR on NCNN — Android-friendly, CPU-only, no FP support needed.
Successor to Mozilla DeepSpeech, maintained by Coqui.
The original open RNN-T from Mozilla — archived but historic.
Production-first E2E ASR — U2++ Conformer, streaming + offline.
End-to-end speech recognition toolkit by ATHENA-OPEN-SOURCE.
Lightweight CMU Sphinx engine for embedded keyword spotting.
The classic Java speech engine from CMU.
Baidu's all-in-one speech toolkit on PaddlePaddle.
The DeepSpeech-style recipes inside PaddleSpeech.
Lightweight Japanese-focused open ASR with WFST decoding.
Word-level alignment via Kaldi for 100+ languages.
NVIDIA's TF1 framework — historical home of Jasper + QuartzNet.
RWTH's flexible neural network training framework for ASR research.
Alibaba DAMO's Paraformer / SenseVoice / Whisper toolkit.
Rev's open WFST-decoded ASR + diarization stack.
Rev.com's WER + alignment scoring tool over WFSTs.
Baidu's TTS half — included for end-to-end voice pipelines.
The reference open diarization + speaker embedding toolkit.
Hervé Bredin's personal mirror of pyannote.audio.
Tiny, accurate voice-activity-detection model — runs on CPU.
Python bindings for Google's WebRTC VAD.
Streaming speaker diarization on top of pyannote.
A minimal pyannote / SpeechBrain diarization wrapper.
Speaker-verification embeddings from a small generalist encoder.
Tiny English ASR optimized for resource-constrained devices.
Mirror of Useful Sensors' Moonshine releases.
Run Whisper / wav2vec2 entirely in the browser via ONNX Runtime Web.
Original transformers.js repo by Joshua Lochner (pre-merge into HF).
Apple's array framework — runs Whisper, Phi, Llama on Apple Silicon.
Swift bindings for MLX — embed Whisper in iOS/macOS apps.
Reference Swift apps for MLX, including Whisper.
Python MLX examples — Whisper, Llama, Stable Diffusion.
Audio + image data loaders for MLX training.
Minimalist Rust ML framework with Whisper support.
The tensor library underneath llama.cpp + whisper.cpp.
The new ggml-org home of whisper.cpp.
GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).
Microsoft's cross-platform inference runtime for ONNX-exported Whisper.
Open exchange format used by every ASR optimizer.
Intel's CPU/iGPU/NPU inference toolkit — Whisper-tuned.
Reference notebooks including Whisper + SeamlessM4T export.
BF16/AMX speedups for Whisper PyTorch inference on Intel CPUs.
High-throughput inference engine — supports Whisper / Llava / Qwen-Audio.
Structured generation runtime — supports Qwen-Audio / Phi-Multimodal.
NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.
Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.
Production inference server — runs audio-multimodal LLMs.
TensorFlow 2 end-to-end ASR — Conformer, ContextNet, DeepSpeech2.
Streaming Conformer + DeepSpeech2 in PyTorch for Mandarin.
Companion audio-classification training repo for MASR.
Multi-backend Python speech-recognition library.
Production-style speaker embedding + verification toolkit.
Open speech-separation toolkit aligned with WeNet ASR.
Meta's C++ ML library — homed wav2letter.
Standalone CTC / sequence decoders from Flashlight.
Meta's original fast convolutional ASR system.
Reference PyTorch implementation of the Conformer architecture.
Reference Speech-Transformer in PyTorch.
Home of WavLM, HuBERT++, Speech-T5, BEATs, VALL-E.
Unified speech-text Transformer (ASR + TTS + VC).
Post-processing for ASR: numbers, dates, units in 20+ languages.
Open clients for Riva — NVIDIA's commercial ASR/TTS server.
Reference SDK — covers the Whisper + Realtime audio endpoints.
Open conversational-AI stack with self-hosted ASR + NLP.
Production transcription microservice powering the LinTO stack.
Modular successor to fairseq used by Seamless models.
Eval harness — includes WER evaluations for ASR.
Free open course on audio ML, including Whisper fine-tuning.
Open ASR leaderboard (LibriSpeech, GigaSpeech, AISHELL).
Simultaneous speech-to-speech translation with streaming ASR.
Open TTS — relevant when pairing ASR with read-back TTS.
OSS 'second brain' that ingests transcripts via Whisper.
Unified audio foundation model (Codec + LM) — handles ASR.
Distributed Whisper / Conformer training at scale.
The deepspeedai-org home of DeepSpeed.
Microsoft's inference-side companion to DeepSpeed.
Model-optimization toolchain — Whisper ONNX/QNN/DirectML targets.
Underlying framework for whisper-jax and TPU ASR research.
JAX's new home under the JAX-ML org.
JAX neural-net library used by whisper-jax.
Subword tokenizer used by Whisper, SeamlessM4T, Canary.
The framework underlying TensorFlowASR + many older recipes.
Text ops + tokenizers integrated with TF ASR pipelines.
Google's research-grade TF framework — original Conformer code.
Tensor-parallel training — used for Speech-LLM scaling.
Mixed-precision / fused ops library used in NeMo training.
C++ / Python API for cuDNN — speeds up custom ASR kernels.
High-performance CUDA matrix kernels used by Whisper engines.
Open distributed framework — supports Whisper LoRA fine-tunes.
Compile + deploy LLMs (and Whisper) to phones / browsers / WebGPU.
Track / serve Whisper experiments and model registry.
Eval harness now covering audio-LLM benchmarks.
Fast neural TTS for Home Assistant — pairs with Whisper.
Mozilla's archived TTS — historical reference.
Reference Tacotron2 + WaveGlow stack from NVIDIA.
Flow-based vocoder companion to Tacotron2.
Multispeaker prosody TTS — historical NVIDIA release.
Multilingual TTS toolkit from Stuttgart IMS.
Alternate-case mirror of IMS Toucan.
Reference E2E TTS — building block for voice-agent loops.
Flow-based parallel TTS reference.
Transformer-based generative audio / TTS.
Conversational TTS — voice agent companion to Whisper.
Open zero-shot voice cloning + TTS.
VITS2 + BERT prosody TTS — companion to Whisper.
Style-conditioned TTS — pairs with Whisper for narration apps.
Original StyleTTS — predecessor of StyleTTS2.
Flow-matching TTS — open and fast.
Open zero-shot voice cloning TTS.
Few-shot voice cloning — companion to Whisper-cloned datasets.
Real-Time Voice Cloning interface — pairs with Whisper alignment.
Meta's audio-generation stack (MusicGen, AudioGen, EnCodec).
Neural audio codec — used by SeamlessM4T + many speech-LMs.
Mirror of facebookresearch/encodec.
Speech-without-text framework from Meta.
Masked-Autoencoder pretrain for audio — feeds downstream ASR.
High-quality neural audio codec — alternative to EnCodec.
Audio data tooling library that pairs with DAC.
Open robotics — includes spoken-command ASR demos.
Generative-audio diffusion — paired with Whisper for content pipelines.
Few-shot text classifier — useful for post-transcript tagging.
RAG over PDFs / transcripts — downstream ASR consumer pattern.
Training framework for large speech-LMs.
Open CLIP — companion vision encoder in multimodal ASR research.
Code-generation T5 — used in voice-coding agents on top of Whisper.
Conditional-LM — historical companion to speech-text research.
Safety layer often paired with Whisper voice agents.
Sequence models — companion to spoken-search recommender pipelines.
Alibaba's patched Megatron — used for Paraformer scale-up.
Compression library used by ASR data pipelines.
Catch-all for Google ASR papers (USM, BigSSL, Conformer).
Historical TF1 seq2seq — early Listen-Attend-Spell era.
Andrej Karpathy's bare-metal C training code — reference for compact ASR.
Face restoration — often paired with Whisper subtitle pipelines.
Stable-Diffusion animation — used with Whisper subs in content pipelines.
Voice-coding agent example over Whisper.
Spherical signal transforms — used in advanced ASR research.
Capitalized-name mirror of whisper-ctranslate2.
Pre-SYSTRAN home of faster-whisper.
The ggml-org-hosted mirror of llama.cpp.
Capitalized-name mirror of wenet.
Free browser-based manual transcription tool — keyboard-shortcut transcript editor.
Open dictation engines used by the Talon Voice community.
Open-source desktop transcription and dictation app built on Whisper.
Open-source Indic ASR models from IIT Madras' AI4Bharat lab — 22 scheduled Indian languages.
Mozilla Common Voice — public-domain multilingual speech corpus that powers many regional STT models.
Meta Massively Multilingual Speech — open-source ASR for 1,100+ languages.
Israeli national Hebrew ASR — research models from the Israeli AI consortium.
AI4D Africa — multilingual African speech datasets and ASR baselines.
Khipu community — open-source Andean Spanish, Quechua, and Aymara speech research.
VinAI Research — Vietnamese-language ASR and speech research from the Vingroup AI arm.
Khmer-language speech recognition research for the Cambodian market.
Typhoon — Thai-language LLM and ASR initiative from SCB 10X.
Mesolitica — Bahasa Malaysia and Bahasa Indonesia speech research checkpoints.
Tbilisi State University Georgian speech recognition research.
Yerevann research lab Armenian speech recognition checkpoints.
Open-source Turkish-language ASR checkpoints from Turkish university labs.
Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.
IIIT-Hyderabad speech lab — academic Indian-language ASR datasets and checkpoints.
IIT Madras speech group — academic Indian-language ASR research and AI4Bharat home.
IIT Bombay speech group — Indian-language ASR research and Bhashini contributions.
Akylai project — Kyrgyz-language voice assistant and ASR research.
Institute of Smart Systems and AI (Nazarbayev University) — Kazakh-language ASR research.
Open Telugu-language speech corpora and models for SE-Indian transcription.
Community-published Tamil-language ASR models and corpora.
Bengali-language ASR datasets and models from the BNLP / Bengali NLP community.
L3Cube Pune — Marathi-language NLP and speech research releases.
Kungliga Biblioteket (National Library of Sweden) Whisper fine-tunes for Swedish.
Norwegian National Library Whisper fine-tunes for Bokmål and Nynorsk.
Chinese University of Hong Kong — Cantonese speech research and open checkpoints.
Open-source framework for voice and multimodal conversational AI agents.
Open-source framework for building realtime AI voice agents on LiveKit's WebRTC stack.
Open-source conversational AI framework with voice channel integration.
Open-core conversational AI platform with voice channels.
Open-source desktop client routing voice to LLM voice agents.
Open-source privacy-respecting voice assistant for home automation.
Open-source framework by Agora for building realtime multimodal voice AI agents.
Kyutai's open speech-to-speech foundation model and demo voice agent.
Open-source Python library for building real-time voice-LLM applications.
Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.
Performance fork of Tortoise — quality kept, latency 5-10x lower.
Self-hosted open-source MOOC platform with caption-track support.
1000h read English audiobook corpus — the canonical ASR benchmark since 2015.
60k hours of unlabeled English audiobook audio for self-supervised pretraining.
Crowd-sourced multilingual speech corpus — 30k+ hours across 130 languages.
452h of TED talk audio + transcripts — the canonical lecture-style ASR benchmark.
400k hours of European Parliament speeches in 23 EU languages.
44.5k hours of read multilingual audiobook speech across 8 European languages.
TED-based English→X speech translation corpus across 14 target languages.
Common Voice-based speech-translation corpus — 21 X→en + 15 en→X language pairs.
Few-shot multilingual evaluation across 102 languages — n-way parallel speech.
Multilingual SUPERB — 143 languages × multiple tasks for self-supervised speech models.
Speech processing Universal PERformance Benchmark — 10 English speech tasks.
10,000h English ASR corpus — audiobook + podcast + YouTube blend, multiple subsets.
30,000h multilingual evolution of GigaSpeech — Thai, Indonesian, Vietnamese launch.
30,000h CC-BY-licensed English ASR corpus — Internet-Archive sourced.
500kh of YouTube speech across 100+ languages with CC-licensed subtitles.
Refresh of YODAS with long-form audio + per-language sharding — 422k hours.
5000h of professionally-transcribed earnings-call audio — financial-domain ASR.
125h earnings-call ASR test set with 27-accent speaker coverage.
100h multi-microphone meeting recordings with diarization + speaker labels.
72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.
Real-world dinner-party recordings — far-field ASR + diarization in noise.
Distant-mic ASR challenge — multi-channel meeting transcription frontier.
100k utterances of celebrity speech from YouTube — speaker recognition benchmark.
1M utterances of celebrity speech — scaled-up speaker recognition corpus.
50h audio-visual diarization corpus — wild YouTube speakers in conversation.
Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.
260h of conversational US English telephone speech — historical ASR benchmark.
2000h of telephone conversations — scaled-up successor to Switchboard.
60h of unscripted home-telephone conversations — diarization + ASR benchmark.
80h of read newspaper sentences — foundational read-speech ASR corpus from 1992.
Phonetically-balanced 5h read-speech corpus from 1986 — phoneme recognition benchmark.
178h Mandarin read-speech corpus — open Chinese ASR baseline.
1000h Mandarin read-speech corpus — scaled-up successor.
120h Mandarin meeting corpus — multi-speaker conference-room scenarios.
1000h Korean spontaneous-speech corpus — the open KR ASR baseline.
Japanese ASR corpus — 35k hours of TV recordings with captions.
Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.
30h Japanese versatile multi-speaker corpus — TTS + speaker-modeling baseline.
44h multi-speaker English corpus — 109 speakers across global accents for TTS.
24h single-speaker English audiobook corpus — the canonical TTS baseline.
12h dyadic emotional speech corpus — the gold-standard SER benchmark.
Audio-visual emotional speech + song corpus — open SER benchmark.
Multimodal emotion corpus from Friends TV show — conversational emotion recognition.
7442 audio-visual emotional speech clips from 91 actors — open SER corpus.
109h corpus of music + speech + noise — augmentation backbone for ASR/SV.
Room impulse responses + isotropic noises — reverberation augmentation set.
Open Speech and Language Resources — the index of 130+ free speech corpora.
6.6kh language-identification corpus — 107 languages from YouTube.
30h spoken-language-understanding corpus — intent classification benchmark.
1s keyword-spotting corpus — 35 single-word commands, ~100k utterances.
Long-form Wikipedia audiobook recordings in English / German / Dutch — ~1000h.
BBC broadcast-media ASR + diarization challenge — multi-year evaluation series.
Long-form podcast ASR + speaker-role corpus.
100k hours of English podcasts with metadata — TREC podcast evaluation corpus.
Multilingual conversational SLU dataset — 6 languages with disfluencies + code-switching.
5kh weakly-labeled multilingual TTS corpus from YouTube — 50 languages.
Toy 60-utterance Hebrew corpus — the Kaldi 'hello world' dataset.
16kh Indic-language ASR corpus across 22 Indian languages.
1684h read-speech ASR benchmark across 12 Indian languages.
Indic-language version of SUPERB — 12 languages × 6 speech tasks.
6457h Indic-language ASR corpus from All India Radio news broadcasts.
20kh Russian ASR corpus — the largest open Russian-language speech dataset.
Crowd-sourced multilingual read-speech corpus — the open-source pre-Common-Voice corpus.
15h Vietnamese read-speech ASR corpus — the open Vietnamese ASR baseline.
36h Thai emotional-speech corpus — the open Thai SER + ASR baseline.
HuggingFace ASR leaderboard — public WER + RTFx across 8 English test sets.
Aggregated ASR leaderboards across 100+ benchmarks + papers + code.
Korean government open-data hub for speech + NLP corpora — 30+ speech datasets.
NIST Speaker Recognition Evaluation — the canonical SV/SD benchmark series.
Open Speech Analytic Technologies — noise-robust ASR + KWS + SAD challenge.
Speech-translation corpus from European Parliament across 9 languages.
Low-resource multilingual ASR + KWS corpora — 25+ languages from telephony.
Johns Hopkins Center for Language and Speech Processing — Kaldi + LibriSpeech + Sherpa origins.
Brno University of Technology speech group — DIHARD + x-vector + WeSpeaker origins.
Centre for Speech Technology Research — VCTK + Merlin TTS + Festival origins.
Carnegie Mellon Language Technologies Institute — Sphinx + ESPnet + YODAS origins.
MIT Spoken Language Systems Group — TIMIT + Galaxy + Jupiter origins.
National Taiwan University Speech Lab — S3PRL + SUPERB origins.
Meta AI speech research — wav2vec 2.0 + HuBERT + MMS + Seamless origins.
Google Research Speech — USM + Chirp + AudioPaLM + FLEURS origins.
NVIDIA Speech Research — NeMo + Canary + Parakeet + Riva origins.
IIT Madras Indic AI lab — IndicVoices + Kathbath + IndicSUPERB + IndicWav2Vec.
Inria Nancy speech research team — diarization + speech enhancement leaders.
French national speech-tech lab — TC-STAR + Quaero + ELRA-LDC origins.
RWTH Aachen i6 group — RASR toolkit + IWSLT speech translation history.
International Computer Science Institute — ICSI Meeting Corpus + Aurora origins.
Mitsubishi Electric Research Labs Speech Group — CHiME + speech-enhancement leaders.
MLCommons Speech working group — People's Speech + MLPerf speech benchmarks.
International Workshop on Spoken Language Translation — annual ST evaluation.
Hub of 5000+ audio + speech datasets — the modern catalog after OpenSLR.
Open-source multilingual TTS with zero-shot voice cloning.
Open-source generative audio model from Suno — speech, music, and sound effects.
Open-source neural TTS with strong prosody and voice cloning.
MyShell's open-source voice cloning with tone-color extraction.
High-quality multi-lingual TTS from MyShell — fast and CPU-friendly.
End-to-end TTS with adversarial training — the open-source workhorse.
Non-autoregressive TTS reference implementation — fast and parallelizable.
ESPnet's TTS recipes — multi-architecture, multi-language.
Mycroft's neural TTS — designed for Raspberry Pi voice assistants.
Rhasspy's predecessor TTS — Tacotron-style models for offline assistants.
Fast, on-device neural TTS optimized for Raspberry Pi 4.
Classic Edinburgh / CMU concatenative TTS — academic reference.
Compact open-source TTS for 100+ languages — the embedded workhorse.
Java-based open-source TTS platform — research and academic deployments.
Diphone-based TTS engine — paired with eSpeak NG for more natural output.
Google's seminal end-to-end TTS architecture — the neural-TTS starting point.
Diffusion-probabilistic TTS reference implementation.
NVIDIA's parallel TTS architecture with explicit pitch control.
Lightweight 82M-param open-source TTS — Apache-2.0, runs on a Raspberry Pi.
Resemble AI's open-source emotion-aware TTS — community-licensed.
DeepMind's seminal 2016 neural-vocoder paper — historical reference only.
GAN-based neural vocoder reference — fast and high-quality.
Camb.ai's open-source MARS5 multilingual TTS reference.
Open-source toolkit for audio, music, and speech generation.
Bilibili's open-source TTS — Chinese + English bilingual.
Open-source voice assistant — community-forked after the original company wound down.
Community continuation of Mycroft — modular open-source voice assistant for Linux + Pi.
Fully offline voice assistant for Home Assistant — runs on a Raspberry Pi with no cloud.
Home Assistant's first-party voice surface — Rhasspy's successor, integrated into HA core.
Open-source personal assistant — self-hostable, privacy-respecting, modular skills.
Open-source DIY captioning glasses powered by Whisper — community hardware project.
Open-source wake-word engine — community alternative to Porcupine and Snips.
Legacy customizable wake-word engine — community-maintained after KITT.AI shutdown.
Transcription APIs
Hosted Whisper large-v3 from OpenAI — $0.006 per minute.
Universal-2 model + diarization, PII redaction, topic detection, summarization.
Nova-2 model, excellent streaming, strong at conversational audio.
The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.
Whisper-based API with diarization, 99-language coverage, pay-per-minute.
Enterprise ASR with strong accents and on-prem deployment options.
Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.
AWS managed speech-to-text with batch + streaming, custom vocabulary, and medical/call-analytics variants.
HIPAA-eligible medical-specialty ASR from AWS for clinical conversations and dictation.
Microsoft Azure's managed STT with batch, real-time, custom speech, and conversation transcription.
GCP Speech v2 with Chirp 2 foundation model, batch + streaming, 125+ language variants.
Google's universal speech foundation model exposed via Speech-to-Text v2.
IBM Cloud's managed ASR with on-prem option, custom acoustic + language models.
OCI managed speech-to-text with batch + real-time and Whisper-based models.
Alibaba's managed Chinese-first ASR with batch + real-time and customizable hotwords.
Tencent's managed Chinese-first ASR with one-sentence, real-time, and recording-file modes.
Baidu AI Cloud's Chinese-first speech recognition family.
Yandex Cloud's managed Russian-first STT + TTS with batch and streaming.
Sber's Russian-language speech recognition + synthesis platform.
Huawei Cloud's managed ASR + TTS with one-sentence, real-time, and long-audio modes.
iFlyTek's market-leading Mandarin ASR family for enterprise and education.
ByteDance's Volcengine speech-to-text platform powering Douyin/CapCut workflows.
Naver Cloud's Korean-first ASR with batch + real-time and speaker diarization.
Kakao Enterprise's Korean speech recognition + synthesis platform.
NTT Com's Japanese-first STT under the COTOHA AI platform.
Chinese embedded ASR specialist for IoT devices and on-device speech.
Real-time multilingual ASR API with low-latency streaming and code-switching support.
ElevenLabs' speech-to-text API as a counterpart to its TTS, multilingual, word-timestamped.
Video-AI workflow platform with Whisper-based transcription endpoints.
Replicate's catalog of community-hosted Whisper variants behind one API.
Modal's serverless GPU platform commonly used to host Whisper / faster-whisper as an API.
RunPod's GPU cloud commonly used to deploy Whisper / faster-whisper as a serverless endpoint.
fal.ai's hosted Whisper-family endpoints — low-latency, pay-per-second.
Groq's LPU-based Whisper-large-v3 endpoint — exceptionally low-latency transcription.
OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).
Romanian-headquartered transcription API with strong CEE language coverage.
Meta's free natural-language and speech understanding platform.
Vonage's CPaaS speech-to-text via the ASR connector (typically Deepgram-powered).
Plivo's CPaaS speech recognition for IVR + call-recording workflows.
Bandwidth's voice CPaaS with optional transcription on recordings and IVR.
Play.HT's transcription endpoint as a counterpart to its TTS family.
Hosted Whisper API at low per-hour pricing for developers.
Hosted Whisper API with file-based and URL ingestion.
Deep-learning ASR you can deploy in your own cloud or use as managed SaaS.
Real-time streaming variant of Amazon Transcribe over HTTP/2 + WebSocket.
Azure Speech's batch-fast mode for short-turnaround transcription with predictable latency.
Diarization layer for Google Cloud Speech-to-Text v2.
Deepgram's current-generation streaming + batch ASR model.
AssemblyAI's WebSocket streaming endpoint for live captions and agents.
Gladia's real-time streaming ASR API with multilingual code-switching.
OpenAI's hosted Whisper + gpt-4o-transcribe models, batch endpoint.
OpenAI's translate-to-English audio endpoint.
SambaNova's hosted Whisper-large-v3 endpoint on its RDU accelerator.
Together AI's hosted Whisper models among its open-model catalog.
DeepInfra's hosted Whisper endpoint with per-second GPU pricing.
OVHcloud's managed speech-to-text inside its sovereign EU cloud.
Scaleway's GPU inference platform commonly used for hosted Whisper.
Alibaba's Tongyi multimodal model exposed for transcription + audio understanding.
Baidu's ERNIE-aligned speech models inside ERNIE Bot Cloud.
Huawei's Pangu foundation models extended to speech for enterprise scenarios.
Tencent's Hunyuan multimodal model with audio understanding endpoints.
Naver's HyperCLOVA X foundation model with audio understanding.
Kakao's Kanana foundation-model family with audio understanding.
Rev.ai's WebSocket streaming endpoint for live transcripts.
Speechmatics batch ASR with broad language pack catalog.
Empathic voice interface with emotional-tone awareness.
Developer API access to Otter.ai's transcription engine.
Open-source-anchored conversational AI for enterprise.
Google's conversational-AI platform for voice and chat agents.
AWS conversational-AI platform for voice and text bots.
Microsoft's open-source SDK and platform for conversational bots.
Rev's enterprise transcription and recording API platform.
Trint's transcription and translation API for newsrooms and media teams.
China's largest speech AI vendor — Mandarin, dialects, and 60+ languages via developer APIs.
Tencent's cloud speech-to-text with one-sentence, sentence, and real-time APIs.
Alibaba Cloud / DAMO Academy speech recognition with Paraformer non-autoregressive models.
ByteDance's Volcano Engine speech-to-text — short, long, and streaming Mandarin ASR.
Mobvoi (Chumen Wenwen) speech APIs — Mandarin recognition behind TicWatch and Volkswagen voice.
Youdao Cloud speech-to-text — Mandarin recognition behind Youdao Translator and dictionary pen.
Sogou (Tencent-owned) speech-to-text — input-method-grade Mandarin recognition.
Reverie's Indic speech recognition — 11 Indian languages from one of Reliance Jio's group companies.
Government of India's national language platform — public ASR APIs for 22 official languages.
Sarvam AI — full-stack Indian foundation models including Saaras / Saaransh speech APIs.
Tinkoff VoiceKit — Russian-language ASR + TTS used inside Tinkoff Bank's contact centre.
SoundHound Houndify — multilingual voice AI platform with embedded and cloud ASR.
Lelapa AI — South African startup building Vulavula speech and language tools for African languages.
Intella — Arabic speech-to-text API focused on MSA and major Arabic dialects.
Alvenir — Danish-language speech-to-text product from a Copenhagen startup.
AI-Loop — multilingual African-language speech and NLP infrastructure.
Empathic Voice Interface — voice AI that reads and responds to emotion in speech.
Google Cloud's enterprise conversational AI platform with voice and chat channels.
Microsoft's bot orchestration SDK with voice channels via Direct Line Speech.
IBM's enterprise conversational AI platform with voice and contact-center integrations.
Google Cloud's LLM-native conversational AI builder with voice support.
Twilio's ASR, voice intelligence, and ConversationRelay primitives for voice agents.
Voice AI agent capability layered on Plivo's CPaaS voice network.
AI voice tooling layered on Bandwidth's tier-1 U.S. carrier network.
AI inference and voice agents on Telnyx's own carrier and GPU stack.
CPaaS with serverless VoxEngine scenarios and AI voice integrations.
WebRTC infrastructure for realtime voice and video AI agents.
Single API for low-latency voice agents bundling Deepgram ASR + LLM + TTS.
AssemblyAI's LLM framework over its ASR for voice intelligence and agents.
ElevenLabs' end-to-end voice agent API with ASR, LLM, and premium TTS.
Microsoft Azure's bundle of Speech SDK + Bot Framework for voice agents.
Speech-native LLM and hosted agent runtime by Fixie.ai.
OpenAI's Agents SDK pattern over the Realtime API for voice-native assistants.
Reference patterns for building voice agents with Anthropic Claude models.
Cartesia's Sonic TTS plus partner ASR/LLM for low-latency voice agents.
AI-dubbing API for video platforms — backend OEM rather than a creator-facing app.
Camb.ai's standalone text-to-speech surface — same MARS model that powers their dubbing.
TTS + STT API with consumer text-reader apps.
Desktop apps
Polished Mac app for Whisper — the default pick if you're on macOS.
Always-on system-wide dictation for macOS and iOS, powered by local Whisper.
Free Mac App Store Whisper app — drag, drop, done.
Broadcast-style DAW built for journalists, podcasters, and audiobook producers.
Wondershare's consumer video editor with AI captioning and short-form features.
Adobe Premiere Pro's built-in speech-to-text and caption track.
Apple FCP's caption track with macOS dictation-based transcription assist.
Resolve Studio's built-in speech-to-text caption track.
Native macOS app that removes silences from videos before you import to your editor.
Industry-standard audio repair suite — dialogue isolation, declip, dehum, denoise.
Adobe Audition with built-in Enhance Speech, multitrack, and spectral repair.
Free open-source DAW with community plugins and a budding AI line.
Fast, simple cross-platform audio editor for clean-up work.
Hindenburg's automatic leveling + Voice Profiler tuned for spoken word.
Adobe's Enhance Speech model exposed inside Premiere Pro.
Free open-source subtitle editor with broad format support.
Open-source advanced subtitle editor favored by anime fansubbers.
Veteran Windows subtitle editor with batch conversion.
Court reporter CAT (computer-aided transcription) software for stenographers.
Stenographic court-reporter CAT software — main competitor to Stenograph CATalyst.
Court reporter CAT software historically marketed as Case CATalyst by Stenograph.
Predecessor to Glean — audio note-taking software for students.
Long-standing transcription playback software with foot pedal support.
Academic transcription playback software popular in qualitative research.
Built-in transcription playback inside the MAXQDA qualitative analysis software.
Veteran transcription playback and subtitle authoring tool.
Qualitative analysis software for audio and video transcripts.
AI-augmented dictation app — types into any text field on macOS and Windows.
Classic Windows desktop dictation for general users.
Consumer-tier Dragon for personal Windows users.
Programmable voice control for power users — Linux, macOS, Windows.
Windows AI voice assistant and dictation product.
Long-running Windows voice command and dictation utility.
Windows dictation utility built around Whisper.
Windows accessibility utility for mouse and keyboard control by voice.
AI dictation app — type into any field on macOS and Windows.
macOS dictation app that uses Whisper locally.
Front-end utilities making Apple Voice Control easier to configure.
Mac authoring tool for broadcast captions and post-production workflows.
Windows authoring tool for closed captions and subtitles in broadcast and OTT.
Professional subtitle authoring suite used across European broadcast and OTT.
Classic Windows subtitle authoring tool with deep cinema and broadcast support.
Subtitle preparation and broadcast playout suite used across European TV.
UK teletext-era subtitle origination still used in legacy broadcast workflows.
Subtitle preparation tool from Screen Subtitling for broadcast and OTT delivery.
Original Cheetah Systems caption authoring tool — name now used by Telestream.
Subtitle preparation suite from Italy's Logosys, used in European broadcast.
Subtitle preparation and live re-speaking tool aimed at European broadcasters.
Niche subtitle authoring tool used by smaller European subtitle houses.
Industry-standard CAT software for stenographers and CART captioners.
Niche but durable CAT software for stenographers, with realtime captioning support.
Desktop AI video translation suite for Windows + macOS — batch dub multiple files offline.
Real-time voice changer + AI voice cloning — desktop, Windows-first, streamer audience.
Free Windows TTS reader — uses installed SAPI voices, scriptable, no telemetry.
Pocket-sized dedicated live-translation hardware — 84 languages, two-way voice.
Handheld translator with lifetime free internet — 70+ languages, M3 + V4 models.
Crowdfunded handheld translator — 105 languages, Kickstarter origin.
Pen-shaped scanning translator — point at printed text, get spoken translation.
Wearable / handheld live translation hardware from Lingmo International.
Enterprise CAT tool with subtitle + voice-over file-format support.
Enterprise CAT tool with subtitle file + AV reference support.
Authoring tool for SCORM e-learning with closed-caption support per slide.
Adobe's e-learning authoring tool with closed-caption support for SCORM courses.
PowerPoint-based e-learning authoring with closed-caption and narration tools.
TechSmith Camtasia desktop video editor with auto-caption generation.
Qualitative research software with auto-transcription for interviews and focus groups.
Qualitative analysis platform with AI auto-coding and transcription.
Qualitative + mixed-methods research with built-in transcription tooling.
Qualitative + mixed-methods coding tool from Provalis Research.
Visual qualitative coding for thematic analysis, with transcript import.
Avid Pro Tools' built-in dialogue transcription clip-effect for post and ADR sessions.
Apple Logic Pro's AI mixing assistant — leans on Stem Splitter + transcript-aware vocal balancing.
Ableton Live as a host for third-party AI vocal/speech plugins — Sonible, Acon, Synchro Arts, etc.
FL Studio as a VST host for AI speech-enhancement plugins (smart:EQ, Clarity Vx, DeNoise).
Steinberg Cubase Pro with VocalChain, AudioWarp, and built-in hooks for VocAlign / Revoice Pro.
Steinberg's post-production DAW with AI-assisted DialogueDetective and ADR workflow.
Cockos Reaper — cheap, scriptable, hosts every AI dialogue plugin via VST3/CLAP/JSFX.
Studio One with Stem Separation, Lyrics Display, and Vocal/Speech-aware tooling.
Standalone + ARA dialogue/vocal alignment with AI Process Assist.
Lighter VocAlign aimed at music producers for unison/double-track tightening.
VocAlign Pro — timing + pitch alignment, used for ADR and lip-sync dubbing.
Phase + timing alignment across multi-mic dialogue recordings.
Match production-dialogue tone to ADR recordings — EQ + reverb + ambience capture.
Source-separation module for stripping music + ambience away from dialogue.
AI-based dialogue extraction plugin for film, broadcast, and podcast post.
Spectral noise reduction with adaptive AI noise-print learning.
Neural-network dialogue noise reduction — single-knob, AAX/VST3/AU.
AI de-reverb companion to Clarity Vx — removes room reflections from spoken dialogue.
Broadcast-grade dialogue restoration suite — DNS, DeClick, DeHiss, DeBuzz.
CEDAR's Dialogue Noise Suppression hardware + plugin line — production-set staple.
NUGEN's upmix + loudness suite with speech-aware dialogue level checking.
Single-fingerprint noise reducer tuned for dialogue and vocals.
AI-driven adaptive EQ with a 'speech' profile for podcast and dialogue tracks.
Compressor with AI gain-profile detection — speech / vocals / drums presets.
AI-presetting compressor / limiter / reverb trio aimed at podcasters.
ML-based dialogue plugins for spill removal, enhancement, and noise-aware gating.
FCP / Premiere / DaVinci-native AI noise + echo removal for video editors.
Real-time de-reverb plugin built on Zynaptiq's MAP (Mixed-signal Audio Processing) tech.
Real-time linear-filter compensator — fixes muffled mics, comb filtering, telephone EQ.
Vocal pitch + formant shifter — used as a quick speech-disguise / character voicer.
Source-separation suite for music + speech-volume normalization for podcasts.
Remote-collaboration DAW client for film/TV post — built-in AI dialogue tooling.
Media Composer's phonetic search + transcript-aware logging for NLE editors.
MAGIX Vegas Pro with AI transcription, noise reduction, and smart-mask audio routing.
Lightworks NLE with AI-assisted transcription + caption workflow.
OBS hosts third-party AI live-caption plugins (e.g. obs-localvocal) for streamers.
Streamlabs (OBS fork) with AI Highlighter and stream-clip detection.
NDI's free toolset with built-in caption + speech routing for production switchers.
VJ software where AI speech-/audio-reactive plugins drive visuals from voice input.
Apple's free DAW — useful entry-point for podcasters before upgrading to Logic Pro.
Reason DAW as a VST3 / Rack Extension host for AI speech-cleanup plugins.
Bitwig Studio with VST3/CLAP-hosted AI dialogue plugins — Sonible, Acon, Waves.
Cross-platform open-source DAW that hosts AI VST3/LV2 dialogue plugins.
Console-style DAW (Harrison) with classic-mic preamp emulations + dialogue workflow.
MAGIX's pro DAW with object-oriented editing + speech-aware mastering chain.
Mastering DAW with podcast-oriented loudness + speech-aware analysis tools.
macOS / iOS / online audio editor used widely in podcast post — clean speech-edit UX.
macOS-native multi-track editor with spectral repair and AU plugin hosting.
RX Elements — the entry-tier of the RX family, often bundled with audio interfaces.
AI-assisted post-production editor that auto-syncs SFX + dialogue to picture.
Immersive object-based mixing platform — speech routing in 360 audio.
AI noise/reverb removal plugin from Supertone — broadcast-grade real-time dialogue cleanup.
Neural-audio research plugin platform — host AI speech / instrument models in any DAW.
Convolution / room-simulation suite used for dialogue placement in immersive mixes.
FabFilter's flagship EQ — Pro-Q 4 adds an AI Spectrum Match for dialogue / vocal matching.
ElevenLabs' speech-to-speech voice conversion — wrapped as a desktop / plugin workflow.
Real-time AI voice-changer routed into Discord, OBS, Zoom — system-level audio plugin.
Topaz Labs' speech-enhancement engine — currently in invite beta.
iZotope Ozone — AI Master Assistant with explicit speech / podcast targets.
iZotope's vocal channel strip with Vocal Assistant — speech and vocal modes.
Real-time AI voice changer with universal app compatibility.
iMyFone's real-time voice changer for Windows/macOS with 600+ effects.
Free Windows voice changer with system-wide audio interception.
Audio4fun's commercial voice changer suite for Windows.
Screaming Bee's veteran voice changer for gaming and online play.
Windows TTS reader with batch file conversion.
Apple's voice assistant across iPhone, iPad, Mac, Apple Watch, and HomePod.
The LLM-augmented Siri introduced with Apple Intelligence on iOS 18 / macOS Sequoia.
Microsoft's voice assistant — consumer surfaces deprecated; remnants live on in Teams.
Real-time captions in visionOS — overlay spoken speech as floating text in your field of view.
System-wide on-device captioning of any audio on Android — calls, video, podcasts.
Use AirPods Pro as remote mic + real-time amplifier — hearing-assist mode.
FDA-cleared hearing-aid mode in AirPods Pro 2 — clinical-grade hearing assistance.
Siri on HomePod / HomePod mini — voice-first smart-home control.
Hold the Siri button on the Apple TV remote to search content by voice.
On-device dictation in visionOS for any text-input UI element.
Products
Meeting-bot transcription product for Zoom/Meet/Teams.
Human + AI transcription, highest accuracy tier on the market.
Audio/video editor that treats the transcript as the timeline — different product category.
Enterprise-focused transcription + collaborative editor for newsrooms.
Meeting-bot transcription + CRM integrations, competitor to Otter.
Post-call and real-time call analytics on AWS with sentiment, talk-time, and issue detection.
Conversation intelligence API with transcripts, action items, and live agent assist.
Enterprise call-analytics ASR engine, on-prem-friendly, acquired by Medallia.
Conversation analytics API acquired by LivePerson, originally a transcription-first vendor.
Compliance-focused voice + comms surveillance for regulated finance.
Real-time agent assist analyzing voice paralinguistics for emotion + behavior coaching.
Hume's voice-AI platform with prosody/expression analysis bundled with transcription.
On-device streaming speech-to-text optimized for embedded and edge.
On-device offline speech-to-text from Picovoice — file-based, no cloud.
On-prem ASR + dialog stack popular in defense, regulated finance, and accessibility.
Krisp's noise-cancellation + transcription + AI meeting assistant API for embedding.
Twilio's call-analytics product on top of Twilio Voice — transcripts + language operators.
API to drop a notetaker bot into Zoom/Meet/Teams meetings and return transcripts.
Universal meeting bot API for Zoom/Meet/Teams/Webex with transcription + raw audio.
RingCentral's call-recording and conversation-intelligence layer.
Dialpad's in-house conversation intelligence on top of its UCaaS + contact-center.
Zoom's bundled meeting AI: transcripts, summaries, smart compose.
Cisco Webex's meeting AI: transcripts, summaries, action items.
Teams Premium's AI meeting features: intelligent recap, live captions, translations.
Gemini-powered take-notes-for-me, captions, and translation inside Google Meet.
Fireflies' notetaker exposed as an API for meetings + custom audio uploads.
Sales-focused meeting recorder with transcripts, clips, and coaching.
Free AI meeting notetaker with transcripts, summaries, and CRM sync.
AI meeting notetaker focused on Zoom + Meet with multilingual transcripts.
AI notetaker for Google Meet, Zoom, and Teams with customizable templates.
Meeting lifecycle + conversation intelligence platform for revenue teams.
Meeting management + AI notetaker with agendas, action items, and 1:1 templates.
AI notetaker focused on premium meeting summaries and CRM workflows.
AI scrum master + meeting notetaker for engineering teams.
Otter.ai's team and enterprise tiers with OtterPilot for live meetings.
Rev's enterprise plans bundling ASR, transcripts and Verbit-class managed services.
Enterprise transcription + captioning + ASR for legal, media, education, government.
Hybrid AI + human transcription with file-based ordering.
Human transcription service with 119+ language pairs and multiple turnaround tiers.
Hybrid AI + human transcription with strong privacy and security positioning.
US-based human transcription with same-day turnaround for legal and government.
Dublin-based AI + human transcription, subtitles, and dubbing platform.
AI transcription + collaborative editor with multilingual support.
Rev's machine-only transcription product at a flat per-minute rate.
Dutch AI + human transcription and subtitling platform with EU data residency.
Rev.com's per-minute podcast transcription with English + Spanish coverage.
Descript's video + audio editor with ASR-driven transcript editing.
Trint's newsroom-focused transcript editor with multi-language workflows.
Contact-center AI for agent assist, automated QA, and conversational intelligence.
Long-running conversation-analytics suite for contact-center operations and compliance.
Verint's voice-of-customer + workforce-engagement analytics on calls.
NICE's AI layer for CXone with proprietary contact-center speech models.
Contact-center AI for auto-QA, agent assist and analytics on conversational signal.
Talkdesk's AI layer for agent assist, automated summaries, and QA inside its CCaaS.
Genesys Cloud's bundled AI tier with voice transcripts, summaries, and agent copilot.
Five9's AI capabilities for contact-center voice and digital channels.
Outbound contact-center platform with conversation analytics for sales calls.
Aircall's AI layer for transcription, summaries, and call insights.
Real-time agent assist and contact-center AI built on proprietary LLMs.
Real-time playbook + coaching engine for contact-center agents.
Conversation analytics platform focused on customer-effort and CX measurement.
Revenue-execution platform analyzing inbound calls for marketing attribution + conversion.
Call analytics + conversation AI for marketing attribution.
CallRail's transcription + conversation analytics layer for inbound calls.
Voice AI infrastructure with proprietary ASR + voice bots for enterprise contact centers.
Voice and video research platform using ASR + NLP for qualitative analysis.
Voice + video feedback widget with auto-transcription for product teams.
AI voice agent platform for inbound and outbound phone calls.
AI voice-agent platform for outbound + inbound calls at scale.
Voice-AI infrastructure: turnkey assistants composed of STT + LLM + TTS providers.
Real-time voice-agent platform with low-latency conversational AI.
No-code voice-agent builder for SMB phone automation.
Conversation-design platform for voice and chat agents.
AI phone agents and contact-center for outbound sales.
Real-time speech-to-text + intent platform; acquired by Roblox in 2023.
Qualitative-research transcription + NLP for media monitoring and research.
Multilingual transcription app with strong Asia-Pacific footprint.
Yandex Cloud's call-center analytics product layered on SpeechKit.
Swiss interview transcription tool for journalists and researchers.
Meeting recorder for recruitment and sales with structured-output templates.
Voice intelligence platform for transcripts, summaries and topic analysis.
Swiss voice-AI platform with strong Swiss-German + multilingual coverage.
Enterprise voice analytics + eDiscovery on call recordings.
Voice-AI IVR replacement for contact centers.
Enterprise virtual assistant for contact centers with human-in-the-loop fallback.
Cloud-based ASR + speech-analytics service for contact centers.
Voice-analytics platform spanning recording, transcription, scoring and BI.
AI stem separation + transcription for media, music, and podcast post-production.
Conversation-AI API focused on long-form transcription + summarization.
Azure Speech's multi-speaker meeting transcription with channel and speaker ID.
Speechmatics' voice agent API combining ASR, LLM, and TTS.
IBM watsonx Assistant's speech-in for chat/voice agents.
Twilio's media-streaming primitive piped to partner ASR vendors.
Deepgram's medical-tuned variant of Nova-3 for healthcare ASR.
Speechmatics' enterprise ASR with on-prem option for healthcare.
Clinical documentation: ambient transcription + structured medical notes.
Ambient clinical documentation, now Microsoft Dragon Copilot.
Voice-enabled AI assistant for clinical documentation.
Hybrid human + AI medical scribe with real-time documentation.
AI medical scribe converting clinical conversations into structured notes.
AI documentation platform for clinicians with multi-specialty coverage.
AI scribe popular outside the US for solo and small-practice clinicians.
Lightweight AI scribe targeting independent US clinicians.
Automated quality-management scoring + coaching from Talkdesk.
Five9's real-time agent-assist module for sentiment, next-best-action, and summary.
Genesys Cloud's built-in voice-bot builder for IVR replacement and self-service.
Vonage's no-code IVA builder for voice and chat agents.
Smaller regional transcription vendors aggregated under a directory-level placeholder.
Tomedes translation agency's human transcription division.
Regional human-transcription marketplaces for niche language pairs.
Podcast workflow tool with transcription, show notes, and clips.
Podcast content engine with transcription + summaries + repurposing.
AI-powered podcast and video producer for transcripts + clips + content.
AI video editor with auto-captions and dubbing for short-form creators.
AI captions and short-clip editor for short-form creators.
AI clipper that turns long videos into short-form clips with captions.
Online AI video editor with auto-captions and clip generation.
Kapwing's online editor with AI auto-subtitles for video creators.
Online video editor with AI subtitles, translation, and dubbing.
Rev's captioning SKU for video accessibility and broadcast.
AI captioning service for short-form video creators.
Web tool generating styled subtitles for video creators.
YouTube's built-in automatic captions for uploaded videos.
TikTok's in-app automatic captions for uploaded short-form videos.
Instagram's in-app auto-captions for Reels and Stories.
Meta's Page-level auto-caption tool for uploaded video.
LinkedIn's native auto-captions for uploaded videos.
Live-streaming production tool with built-in auto-captions.
Broadcast-grade live captioning for TV, government, and education.
Live captioning appliance for events and broadcast.
Remote simultaneous interpretation platform with AI captions and human interpreters.
Multilingual virtual meeting platform with human + AI interpretation.
AI live translation and captioning for meetings and events.
Live in-browser transcription overlay for Google Meet, Zoom, and Teams.
AI notepad that augments your own notes with meeting-context transcripts.
Meeting copilot with engagement analytics and AI-generated recaps.
Multilingual AI meeting assistant with searchable team knowledge base.
AI scrum-master that runs standups, retros, and engineering rituals.
AI meeting assistant with library, analytics, and conversation insights.
AI meeting notes added on top of Krisp's noise-cancellation app.
Wearable AI pendant that captures and transcribes in-person conversations.
AI voice recorder card that pairs with ChatGPT for transcription and summarization.
macOS app that records and indexes everything seen and heard on a Mac.
Quick AI-generated meeting summaries for Zoom, Meet, and Teams.
AI notetaker focused on calendar-based summaries and team standups.
AI templates and meeting recaps inside the Magical productivity extension.
AI meeting copilot that measures speaking time and inclusion metrics.
Enterprise AI notetaker with on-premise deployment and strong compliance posture.
Sales-call analysis and notetaker with focus on Pacific and APAC teams.
Revenue intelligence platform that records, transcribes, and analyzes sales calls.
Conversation intelligence platform inside ZoomInfo's revenue OS.
Conversation intelligence inside the Clari revenue platform (formerly Wingman).
Call recording and intelligence inside the Salesloft revenue workflow platform.
Real-time sales assistant and conversation intelligence in Outreach.
Call recording and AI insights inside Apollo's go-to-market platform.
Real-time conversation coach for cold-call dialers.
Virtual ride-along sales coach for in-home and field sales reps.
Conversation analytics and coaching for sales and customer-success teams.
Conversation intelligence and coaching platform inside Mediafly.
Sales enablement platform with conversation intelligence and onboarding video.
Sales readiness platform with call AI and rep skill analytics.
Revenue data platform that scores sales rep activity and signal capture.
European conversation intelligence platform with deep CRM integration.
Real-time AI sales coaching during live calls.
Revenue intelligence with conversational AI and Salesforce inbox sidebar.
Cloud phone system with real-time transcription and AI call summaries.
Native call recording and AI insights inside HubSpot Sales Hub.
Salesforce's native AI for sales-call summarization and analysis.
AI sales assistant and CRM summarization inside Pipedrive.
Zoho's AI assistant with voice transcription across CRM, mail, and meetings.
AI summarization and assistant inside the ClickUp project-management suite.
Notion's built-in AI summaries for transcribed meetings and notes.
Self-organizing AI notebook with meeting transcription and recall.
Encrypted personal notes app with built-in OpenAI-powered AI features.
Outliner with AI nodes and meeting-capture workflows.
AI assistant inside the Coda doc-and-database platform.
Browser-based AI automation that includes meeting-bot summary playbooks.
Native AI assistant across Zoom meetings, chat, and phone.
Channel and thread summarization plus AI search across Slack workspaces.
AI titles, summaries, and chapters for async video messages.
Microsoft Copilot inside Teams for live meeting summaries and recaps.
Google Gemini for live captions, summaries, and Take Notes for Me in Meet.
Slite's AI assistant for asynchronous docs and meeting notes.
AI recap and on-demand replay for Zoom Events and Webinars.
AI-driven webinar engagement and content intelligence.
Webinar and virtual-event platform with AI session highlights.
B2B event platform with Content Lab for AI clip generation.
Webinar and event platform with AI-generated recaps and clip library.
Browser-based live-streaming studio owned by Hopin.
Multi-destination live-streaming platform with AI recording features.
Webinar platform with built-in transcription and AI recap.
AI captions, chapters, and SEO for marketing-video hosting.
AI summaries and feedback inside the Lattice performance-management platform.
Performance management with AI-generated coaching insights.
AI calendar that protects time for habits, meetings, and focus blocks.
Real-time AI sales coach with conversation intelligence and CRM sync.
Free Chrome-extension AI notetaker for Google Meet.
Meeting notetaker with timeline scribbling and AI summaries.
AI meeting notes for Google Meet and Zoom with live captioning.
Meeting-platform-and-notetaker combo with built-in video calling.
Conversation intelligence and clip-sharing for revenue teams.
Multilingual AI meeting notetaker with action-item automation.
Meeting notetaker with research-interview templates and CRM workflows.
Real-time AI meeting assistant with searchable highlights library.
Meeting management platform with agendas, notes, and AI summaries.
1:1 and team-meeting platform with AI-suggested talking points.
Meeting notes inside Hive's project-management platform.
AI meeting prep, agenda, and recap for sales and customer-success teams.
Real-time AI sales coach and CRM autopilot.
AI meeting recap layer plus email outbound assistant.
AI sales coach that scores reps on emotional and behavioral cues.
Sales engagement layer (formerly Groove) inside Clari's revenue platform.
Enterprise contact-center suite with conversation analytics.
AI and conversation analytics inside the Five9 contact-center platform.
Cloud contact center with built-in real-time transcription and AI scorecards.
Conversation intelligence inside the JustCall cloud-phone platform.
Conversation intelligence inside Zoom for revenue teams.
Clari's revenue-team conversation-listening posture across signals.
Sales engagement and conversation intelligence platform.
Groove sales engagement platform's generative-AI feature set.
Executive-meeting intelligence for IR and board-engagement teams.
Board-meeting AI summaries and governance intelligence.
Board portal with AI-assisted minutes and meeting recap.
No-code voice-AI platform for inbound and outbound phone calls.
Email-and-chat customer-service AI inside Hiver shared inbox.
AI inside Front's customer-communication platform.
Generative-AI agent that resolves customer-support conversations.
Generative AI inside the Zendesk customer-service platform.
Salesforce's AI for customer-service and contact centers.
macOS AI agent that captures and summarizes any audio on the system.
iOS / Android voice-first note-taking app with AI summaries.
Voice-to-clean-text app for capturing rambling thoughts.
Online transcription service from a journalism nonprofit.
Multilingual file transcription with summarization and meeting-bot.
Automatic and human-corrected transcription and subtitling.
Automatic transcription, translation, and subtitling.
Court-reporting and legal deposition platform with AI transcripts.
Cloud-based clinical speech recognition for clinical documentation.
AI medical scribe for clinicians in 30+ countries.
Canadian AI medical scribe for primary care physicians.
Virtual and hybrid event platform with AI recap.
Virtual event platform with AI session summaries.
Virtual and hybrid event platform with engagement and AI features.
Enterprise video platform with event broadcasting and AI captions.
Sales engagement (cadence + dialer) inside ZoomInfo.
Lightning-fast revenue workspace for Salesforce updates and notes.
Interview intelligence for talent-acquisition teams.
AI interviewer-notes platform for recruiting teams.
One-way video interview platform with AI transcription.
Outbound recruiting platform with AI sourcing and engagement.
Conversational-AI recruiter for high-volume hiring.
Lecture-capture and transcription for higher education.
Lecture-capture, video CMS, and AI transcription for education.
Active-learning and lecture-capture platform for higher education.
AI public-speaking coach with private practice mode.
AI English-language tutor with voice conversations.
AI English-speaking practice app backed by OpenAI.
Studio-quality remote recording for podcasts and video.
Lossless remote-recording platform for podcasters.
Browser-based podcast and video recording with AI features.
AI app for shorts/reels creators with auto-captioning.
Rev.com's iOS / Android voice recorder with one-tap transcription.
iOS / Apple Watch voice recorder with on-device transcription.
Generative-AI features across HubSpot marketing and sales tools.
Zoho's AI meeting summaries inside Zoho Meeting.
Premium Teams tier with intelligent recap and AI features.
Original branding for Google's Workspace AI before Gemini rebrand.
Krisp's combined audio-cleanup-plus-meeting-notes product.
AI meeting assistant with team-knowledge questions across past meetings.
FedRAMP-aligned tier of Krisp for US government use.
Privacy-first AI meeting notetaker.
Qualitative-research AI for transcribing and analyzing interviews.
Research repository with AI analysis of qualitative data.
Visual-canvas research-analysis tool with AI transcription.
European qualitative-research analysis platform with AI features.
Participant-recruitment platform with AI session features.
B2B participant-recruitment platform for user research.
User-testing platform with AI insights and transcription.
Live and async user-testing platform for product teams.
Remote-user-research platform with AI session analysis.
Parallel dialer for outbound sales with conversation analytics.
AI-powered parallel dialer and sales platform.
Cloud-phone and SMS platform with AI features for sales teams.
Video-conferencing product from Dialpad with AI recap.
Conversational AI for customer experience across voice and video.
Customer-engagement and contact-center AI platform.
Workforce management and contact-center AI suite.
Call-tracking and AI conversation-intelligence for SMB.
Universal AI employee for enterprise workflows.
Generative-AI customer-support agents.
Conversational-AI platform for consumer brands.
AI meeting summarizer for short, accurate recaps.
Audience-engagement and Q&A platform with AI summaries.
Audience-interaction platform with AI summary of meeting Q&A.
Hardware-based meeting-room solution with AI features.
Logitech meeting-room camera with AI features for hybrid meetings.
All-in-one video bar and meeting-room hardware.
Turn long-form audio into show notes, clips, tweets, and newsletters in one upload.
Browser DAW for podcasts with AI voice clones and one-click cleanup.
Automatic remover of ums, mouth sounds, dead air, and stutters from podcast tracks.
Automatic audio leveling, loudness normalization, and noise reduction for podcast post.
Free web tool that makes any voice recording sound like it was tracked in a studio.
Real-time noise, voice, and echo cancellation on any call or recording.
Hands-off podcast maker — drag in raw tracks, get a leveled, intro-stitched episode.
Human-edited podcast post-production with optional AI assist.
Podcast host with built-in AI transcripts, magic mastering, and episode chapters.
Spotify's free podcast host with AI Voice Translation and auto-transcripts.
Turn podcast audio into shareable audiograms and waveform videos.
Audiogram generator with templated designs for podcast promo clips.
Podcast host with WordPress integration and private feed support.
Modern podcast host built for networks — unlimited shows on one account.
Growth-focused podcast host with marketing tools built in.
Podcast host with live streaming, monetization, and AI episode notes.
Podcast host and live audio platform owned by iHeart with a programmatic ad network.
Enterprise podcast host with monetization and global ad sales.
Spotify's enterprise podcast publishing and ad-insertion platform.
Podcast host owned by SiriusXM with strong embeddable players.
The original podcast host, still running shows that started in the 2000s.
Indie-favorite podcast host with technical features and fair pricing.
Free podcast host with cross-promotion and host-read ad network.
Browser video editor with auto-subtitles in 100+ languages.
Turn blog posts and long videos into branded short clips with auto-captions.
Bulk subtitle and caption generator for video creators.
Mobile-first captioning app for creators recording on a phone.
Long-form video → short-form clips with social-trend awareness.
AI shorts generator — paste a YouTube URL, get vertical clips with captions.
AI clipping tool with face-tracking auto-reframe.
Repurpose long webinars and podcasts into clips, blog posts, and quote graphics.
Add captions, headlines, and resize video for social — one upload.
Browser video editor with strong meme, subtitle, and team-collaboration tooling.
AI transcription + caption studio with translation in 90+ languages.
Transcript-based video editing for research, user interviews, and journalism.
AI video creation and captioning inside the Vimeo platform.
Slide-based video creation with AI scripts and captioning.
Convert blog posts to social videos with auto-captions and stock B-roll.
Prompt-to-video generator with stock library and AI voiceover.
AI text-to-animation and live-action video generator.
AI avatar video generator with talking-head cloning and 100+ language dubs.
Find and clip the most highlight-worthy moments from podcast and interview footage.
Automated pipeline to publish one video to every social platform.
AI-driven Twitter / X scheduling with video-clip-to-thread repurposing.
AI clip-generator targeting faceless TikTok and Reels.
Rev's human-grade captioning service with SRT/VTT/SCC delivery.
Enterprise captioning, transcription, and audio-description service.
Open subtitling platform run by the Participatory Culture Foundation.
Subtitle, translate, and edit captions for social video in 70+ languages.
Transcription, subtitling, voiceover, and dubbing in 125+ languages.
Real-time meeting and video translation across 50+ languages.
Enterprise captioning and video data services for media and education.
Live captioning delivery network for events and webinars.
Browser auto-subtitle and translation tool for social video.
Transcript-based video editor with multi-language subtitle output.
Multi-track leveling and crosstalk reduction for podcast editors.
Castmagic's short-form clip generator built on its podcast transcript output.
Castos host's AI clip + show-note generator for hosted episodes.
Drop a video URL → get pre-drafted X threads in your voice.
Auto-drafted podcast-driven email newsletters.
Production management with AI transcription for broadcast workflows.
Human-grade transcription service with HIPAA + enterprise options.
Speech-to-text platform aimed at broadcast and media in EU markets.
Meeting bot with focused share-with-anyone summary links.
Frame-accurate subtitle editor inside the Happyscribe transcription product.
Cloud subtitle workflow for educators and small media teams.
AI subtitle and dubbing platform with team workflows.
Event video service with AI editing and captioning.
AI workflow for broadcast captioners and accessibility teams.
Rev's translated subtitle service across 16+ language pairs.
AI dubbing platform with lip-sync for video translation.
AI video translation and dubbing platform.
ElevenLabs' video dubbing layer on top of its voice model.
AI voiceover platform with multi-language output for podcasts and videos.
AI voice generator favored for podcast and audiobook workflows.
Enterprise AI voiceover platform with consented voice avatars.
Studio-grade AI voice generation with timeline editor.
Cheap AI voice generator with broad language coverage.
Studio-grade AI voice generation and cloning.
AI avatar video generator with talking-head templates.
Enterprise AI avatar video platform with translation.
AI avatar video platform popular in APAC enterprises.
AI avatar video tool with text-to-video and templates.
Multi-avatar dialogue video for corporate L&D.
Veed.io's standalone auto-subtitle workflow.
Kapwing's stand-alone subtitle workflow.
Affordable AI transcription with strong Turkish and European-language coverage.
Free browser-based dictation tool using browser speech APIs.
Podcast measurement and attribution platform — transcripts power the analytics.
Podcast advertising intelligence built on transcript analysis.
Podcast database with transcript-powered creator and listener intelligence.
Headliner's automated audiogram workflow for podcast snippets.
Veed's live caption layer for streaming and recording.
Loom's auto-caption layer for screen recordings.
AI clipping and short-form video repurposing for creators.
YouTube Creator Studio's built-in auto-caption track.
Vimeo's auto-caption layer for hosted videos.
Panopto's auto-captioning for higher-education and enterprise video.
Kaltura's enterprise auto-captioning for video platforms.
Rev's live captioning service for Zoom and Webex.
AI-Media's live broadcast captioning engine.
AI standup notetaker for engineering teams.
Krisp's AI meeting-notes layer paired with its noise cancellation.
Rev's meeting-bot notetaker built on Rev AI transcription.
Krisp's on-device live caption feature for calls.
Mobile voice notetaker with retroactive recording.
Otter's question-and-answer layer over recorded meetings.
Krisp's voice privacy filter for call agents.
Fireflies' clip-extraction layer for meeting highlights.
Buzzsprout's one-click mastering for hosted episodes.
Long-running podcast host with WordPress PowerPress plugin.
Embeddable podcast player and player-network host.
Affordable podcast host with monetization and AI features.
Transistor's private podcast feature for B2B and internal podcasts.
Premium podcast subscription platform with private RSS.
Spotify's prior podcast brand, now redirected to Spotify for Podcasters.
Spotify's browser DAW with collaborative podcast tools.
Audio learning and private podcast platform.
AI tools for emotive voice and character dialogue.
AI sound design and audio post platform.
AI music and sound bed generator for video creators.
AI music + voice generation aimed at creator workflows.
AI song generation with vocals and instruments from a prompt.
AI song generation with prompt-driven vocals and arrangement.
Royalty-free AI music for creators and video producers.
Lightricks' mobile video editor with AI features.
ByteDance's free cross-platform editor with auto-captions and AI tools.
Mobile AI editor for travel and creator video.
Microsoft's browser video editor with auto-captions and Speaker Coach.
AI video editor that cuts silences and auto-captions long-form video.
Teleprompter app for creators recording on a phone or laptop.
Cloud-based medical speech recognition for clinicians, owned by Microsoft.
Ambient AI clinical documentation — listens to the visit, writes the note.
Vendor-specific AI scribe — verify availability and BAA status before relying on it.
Ambient AI assistant for clinicians — Paris-founded, used in EU and US.
Canadian AI scribe and medical voice assistant for primary care.
AI medical scribe with virtual scribe overlay for orthopedics and specialty care.
Generative AI scribe targeted at emergency medicine and urgent care.
Free AI medical scribe inside the Doximity clinician network app.
AI clinical documentation for Canadian and Australian healthcare.
AI clinical assistant that handles documentation, coding, and tasks for clinicians.
Healthcare-vertical instance of the Lindy AI agent platform.
AI scribe and note-generation product for clinicians.
Early ambient AI scribe — acquired by Nuance in 2021.
AI scribing built into the ModMed EMA EHR for specialty practices.
AI medical scribe bundled with the eClinicalWorks EHR.
AI documentation features inside the athenahealth EHR.
AI documentation features built into the Epic EHR.
Healthcare instance of AVA's live-captioning platform.
AI-powered legal transcription and evidence-search platform.
Remote deposition and video testimony platform with built-in transcription.
Education tier of Otter.ai — classroom and lecture transcription.
Note-taking app for students that records the lecture and structures the notes.
Audio recording inside the Notability note-taking app on Apple devices.
Linked-notes-and-audio app for iOS, Android, Windows, and Mac.
AVA's classroom and group-discussion live-captioning product.
Free Microsoft live-translation app with classroom mode for teachers.
Free Android live-transcription app used in education accessibility settings.
Education and accessibility dictation product.
AI transcription add-on for the NVivo qualitative analysis software.
AI transcription inside the ATLAS.ti qualitative research software.
Education tier of the Trint AI transcription platform.
Education tier of Sonix's AI transcription platform.
Live captioning app for deaf and hard-of-hearing users — group conversations.
System-wide live captions on Android, Chrome, and Pixel devices.
Windows 11 system-wide live captioning for any audio on the device.
System-wide live captions on iOS, iPadOS, and macOS.
Live captioning and relay calling app for deaf and hard-of-hearing users.
Live captioning and translation product targeted at conferences and events.
Multi-microphone live captioning hardware + app for the deaf community.
Free real-time captioned telephone service for hard-of-hearing US users.
Multilingual live captioning and translation product (verify availability).
Smart-glasses live captioning product (verify availability).
AR-glasses live captioning experiences from XREAL (formerly Nreal).
Apps and tools for the deaf community from Sorenson Communications.
Broadcast captioning hardware, encoders, and AI captioning software.
Lecture-capture and live-captioning tool for higher education and workplaces.
Rev's AI-only $0.25/min transcription product.
Manual transcription playback tool for journalists and researchers.
Apple-platform voice memo app that uses OpenAI Whisper for transcription.
Generic name for several mobile audio-to-text apps.
Voice dictation inside the iA Writer minimalist writing app.
Mobile dictation app from Nuance for iOS and Android.
Free browser-based voice typing notepad.
Free browser dictation tool with simple UI.
Apple's accessibility voice control on macOS, iOS, and iPadOS.
Built-in OS dictation across iOS, iPadOS, and macOS.
Built-in Windows dictation and OS voice control.
Modern AI-driven dictation and voice OS control built into Windows 11.
Voice typing inside Google Docs and Google Slides.
Voice typing in the Gboard keyboard on Android and iOS.
Free Android voice-typing notepad.
Web and Android voice typing tool with translation.
Free browser dictation utility.
Speak-to-edit app that turns ramble into clean structured text.
Otter.ai's mobile voice-recording surface.
Mobile voice-to-text note app.
Chrome extension that adds voice typing to any text field on any website.
Web-based dictation notepad with cloud sync.
Mobile dictation utility (verify publisher).
Notta's dictation feature for voice typing on mobile.
Voice-memo capture in the Readwise reading and knowledge product.
Voice dictation inside Word, Outlook, PowerPoint, and OneNote.
Trint's newsroom-targeted product tier.
Reduct.Video's newsroom tier for journalism teams.
Web transcription service for journalists.
Mobile-first AI transcription app for journalists and creators.
Sonix's newsroom tier targeted at broadcast and digital news.
Mobile transcription product for journalists.
Descript's journalism use case (existing tool entry for reference).
Mobile recording-and-transcription apps used by journalists.
AI medical voice assistant — appointment booking and clinical voice tasks.
Medical transcription compliance and audit tool.
Medical voice macros and dictation utility (verify publisher).
Dictation built into the e-MDs / CompuGroup ambulatory EHR.
Dictation features inside the Greenway Health EHRs.
Voice notes inside the Tebra (Kareo + PatientPop) EHR.
AI scribe partnerships inside the DrChrono EHR marketplace.
Praxis EMR's concept-based note generation — adjacent to voice dictation.
Solventum (3M) ambient clinical intelligence built on the M*Modal engine.
Clinical front-end speech recognition product.
Higher-ed support platform with transcription features (verify).
Education video platform's transcription and captioning add-on.
Captioning and transcription built into the Panopto video platform for education.
AI captioning inside the YuJa video platform for higher education.
Captioning inside the Echo360 lecture-capture platform.
Built-in live transcription and captioning in Zoom meetings.
Built-in live transcription and meeting recap in Microsoft Teams.
Built-in live captions and Gemini-powered notes in Google Meet.
Live captions and post-meeting transcript in Cisco Webex.
Automatic transcripts inside the Riverside.fm podcast and video studio.
Built-in transcription in the Podcastle podcast creation platform.
Cloud dictation and transcription workflow for legal and corporate dictation.
Olympus's professional dictation workflow software for legal and medical.
Yitu Tech speech recognition — Mandarin, dialects, and far-field microphone arrays.
Advanced Media's AmiVoice — Japan's longest-running enterprise speech recognition family.
Fujitsu LiveTalk — Japanese real-time captioning for meetings and classrooms.
Selvas AI's Selvy speech recognition — Korean and English ASR for media, finance, and government.
Skit.ai (formerly Vernacular.ai) — voice AI for collections and Indian-language call automation.
Slang Labs — in-app multilingual voice assistant SDK with Indic-language ASR.
MTS AI VoiceTech — Russian-language ASR and voice biometrics from telecom operator MTS.
STC (Sankt-Peterburg) — long-running Russian speech and biometrics vendor, formerly STC-innovations.
VoiceInteraction VoxSigma — Portuguese-strong multilingual broadcast transcription.
Vocapia Research VoxSigma — multilingual broadcast and call-centre transcription from LIMSI heritage.
Verbio Technologies — Spanish-strong multilingual ASR and voice biometrics from Barcelona.
Phonexia — Czech speech and voice-biometrics vendor focused on government and forensic use.
Tilde — Baltic-language NLP vendor with Latvian, Lithuanian, and Estonian speech recognition.
Lingsoft — Finnish language-services group offering Nordic-language ASR and dictation.
Speechmore — Italian-language transcription product for journalists and professionals.
Vocally — French-first transcription tool for journalists and researchers.
VocTroLabs — speech and subtitling research lab from Universitat Politècnica de Catalunya.
Cedat 85 — Italian speech-to-text and stenography vendor for parliaments and media.
Rumi — Arabic-first transcription product tuned for Egyptian, Levantine, and Gulf dialects.
Mawdoo3 — Jordan-based Arabic AI lab building Salma voice assistant and ASR research.
Kalam — Arabic speech-to-text product for journalists and researchers.
VoxLab — Brazilian Portuguese speech recognition for journalists, agencies, and businesses.
TranscribeMe Latin-America operations — Spanish-and-Portuguese human + automatic transcription.
Voiceitt — non-standard-speech recognition for people with speech impairments.
VoxQube — Urdu and South-Asian language transcription research and services.
Navana Tech — Bengali-language conversational AI and ASR from Bangladesh.
Yandex SpeechKit on-premise build — Russian-language ASR for isolated networks.
Tarteel — Qur'anic Arabic recitation recognition for memorisation and tajweed feedback.
Armada — Russian-language conversational and contact-centre AI with embedded ASR.
Speakable — European-Portuguese-tuned transcription for Iberian newsrooms.
Bertin IT MediaSpeech — French defence and intelligence multilingual transcription suite.
Syllable — patient-facing healthcare voicebots with Spanish, English, and Mandarin support.
Convai-style Spanish voice automation tuned for Mexican and Central-American customer service.
Phonexia's Latin-American operations — Spanish-language voice biometrics and STT.
Vivoka — French embedded voice-AI vendor with offline multilingual ASR.
Speak AI — Arabic-and-multilingual research-grade transcription with NLP insights.
Rakuten AIris — Japanese-language speech recognition from Rakuten Institute of Technology.
Onsei — Japanese-language web transcription for individual professionals and researchers.
Conversational AI claimed to hold 10-40 minute human-like phone calls.
Enterprise voice assistants for contact centers in hospitality, banking, and retail.
No-code platform for building AI workflows including voice agents and assistants.
Contact-center voice AI that autonomously resolves routine customer calls.
Plug-and-play conversational AI for healthcare and enterprise voice + chat.
Enterprise conversational AI platform with first-class voice and IVR support.
Enterprise contact-center voice AI from Cognigy.AI conversational platform.
Conversational AI for travel and customer service automation.
Voice AI from Yellow.ai's dynamic automation platform for enterprise CX.
Conversational AI platform for healthcare, banking, and enterprise voice + chat.
Voice agent capability layered on the Druid AI conversational automation platform.
Generative voice AI agents from Gnani.ai for enterprise CX.
AI-powered voice and chat agents for enterprise contact centers.
Voice AI assistants for inbound and outbound business calls.
AI voice agents for auto dealerships and service-based businesses.
Voice AI agents for customer service and sales workflows.
AI agents including voice, chat, and content automation in one platform.
AI sales development representative making outbound voice calls.
AI voice agents for customer support and lead conversion.
Generative voice AI platform for outbound and inbound business calls.
Voice AI agents and contact-center automation.
Conversation intelligence and voice-AI insights for customer-experience teams.
Voice AI for inbound and outbound customer engagement.
Speech analytics, biometrics, and voice bots for European enterprises.
Hybrid AI plus human virtual receptionists for small businesses.
AI phone agents for service-based small businesses.
Hosted RTVI bots — Daily.co's managed runtime for Pipecat voice agents.
Sales-focused cloud phone with AI transcription, coaching, and agent assist.
AI features for CloudTalk's cloud-based call center software.
AI conversation intelligence across RingCentral's UCaaS and CCaaS platform.
Commercial Rasa offering with CALM dialog and enterprise voice connectors.
Five9's contact-center voice and chat virtual agent product.
NICE's AI-orchestrated CX platform with voice virtual agents and Enlighten AI.
Genesys Cloud's voicebot, agent assist, and AI experience orchestration.
AWS Amazon Connect contact center with Q-powered agent assist and bots.
Managed conversational AI for enterprise voice and chat.
Conversational AI and automation across contact-center voice and chat.
Multimodal conversational AI platform for enterprise voice and chat.
Conversational AI platform with voice and chat for enterprises.
Generative AI voice agents for B2B sales and marketing.
AI answering service and virtual receptionist for small businesses.
AI customer-service agents across voice, chat, and self-service.
Ada's brand interaction platform extended to voice channels.
Generative AI for customer support with voice and chat channels.
AI conversation agents for B2B revenue teams.
Voice AI agents tuned for non-English markets.
Generative agent-assist layer for human contact-center agents.
Talkdesk's generative AI voice-and-chat virtual agent for contact centers.
Healthcare-focused AI agents for revenue-cycle and patient communications.
Voice AI agents that handle benefits-verification and prior-auth calls in healthcare.
Conversational AI platform purpose-built for banking and wealth management.
Enterprise voice and chat AI agents from Conversica.
Salesforce's autonomous AI agent for service, with voice and chat channels.
HubSpot Breeze customer agents extending into voice channels.
Automatic live captioning for broadcast, news and live events — flagship of AI-Media's LEXI family.
On-prem variant of LEXI for stations that cannot send audio to the cloud.
Web-based agent for live event captioning operators, paired with the LEXI engine.
Real-time caption translation overlay for the iCap broadcast caption network.
Cloud caption delivery network for live broadcast, the transport layer behind LEXI.
IP caption encoder for SMPTE 2110, NDI and SRT broadcast facilities.
Server-based automatic captioning appliance — the predecessor product line that became LEXI.
Self-contained automatic caption appliance for live linear TV — first-gen broadcast ASR.
Classic SDI caption encoder used across thousands of US TV master controls.
Glossary-aware automatic captioning trained on a station's own proper-noun list.
Automatic caption appliance for radio and TV — a different enCaption from EEG's.
Server-side caption automation inside the Vantage media-processing platform.
Caption-delivery platform used by independent live captioners to bill, deliver and embed captions.
Live captioning app for in-person meetings and small events, built for Deaf and hard-of-hearing users.
AI live captioning for SaaS meetings and webinars with a focus on enterprise accessibility.
Accessibility-compliance tier of Cielo24, aimed at ADA / Section 508 / WCAG 2.1 deliverables.
Cloud subtitle authoring and project-management suite used by media-localisation vendors.
Cloud media-logistics platform with built-in transcription and subtitle authoring.
Caption-review and proof functionality bolted onto the MediaSilo review-and-approval platform.
Cloud subtitle authoring suite from ZOO Digital, used across Hollywood OTT delivery.
Subtitle playout server for live and file-based broadcast distribution.
Newsroom prompter/script system that frequently feeds the captioning pipeline.
Broadcast character generator with caption overlay support — legacy of Inscriber Technology.
Live captioning add-on for the Switchboard Live multistreaming platform.
Automatic captions for the BoxCast live-streaming platform, popular with churches and schools.
Enterprise video platform with built-in AI captioning, popular for internal communications.
Caption support inside Kaltura's video platform for higher-ed and enterprise.
Automatic captioning service inside Brightcove's Video Cloud platform.
Caption pipeline inside JW Player's video platform with optional ASR.
Auto-caption feature inside the Cincopa video-hosting platform.
Legacy IBM-branded captioning offering on top of Watson Speech to Text.
Federal courts' AV system that increasingly bolts on AI captions for hearings.
Court audio/video recording platform increasingly paired with AI transcription.
Court recording platform widely deployed across US, UK and Australian court systems.
Sorenson's next-generation captioned-call and relay-services platform.
WCAG overlay platform whose video module wraps third-party caption ASR.
Accessibility platform that integrates captioning and transcript services.
Accessibility widget vendor with optional caption and transcript services.
LEXI variant tuned for enterprise events that prioritises high precision over speed.
Glossary, profile and workflow management portal for LEXI customers.
Built-in live captioning on the Daily video API platform.
Browser-based live caption + translation overlay for in-person presentations.
Built-in live captioning inside Microsoft Teams meetings and live events.
Built-in live captions inside Google Meet, used widely for accessibility in education.
Live caption and translation features inside Cisco Webex Meetings.
Built-in Zoom live captioning across meetings and webinars.
Auto-generated live captions inside YouTube Live streams.
Auto-captioning support for live broadcasts on Facebook.
Auto-captioning for LinkedIn Live and LinkedIn Events.
Live captioning add-on inside the StreamYard browser studio.
Live captioning feature inside the Restream multistream and studio product.
Auto and manual captioning inside Vimeo's Live and on-demand video products.
Caption support inside Wowza's streaming-server ecosystem.
Captioning passthrough on Haivision's broadcast and enterprise video products.
Live caption viewer used by interpreters and CART writers in European events.
Open-source live caption viewer used by Deaf communities and EU accessibility groups.
Translate a talking-head video into 175+ languages with lip-sync rebuild from the original speaker.
Speech-to-speech translation in 140+ languages with voice cloning — pitched at live sports + media.
Multilingual dubbing layered on top of Speak.AI's transcription + qualitative-analysis workflow.
Subtitle, dubbing, and voice-over in 70+ languages — pitched at marketing teams localizing video ads.
AI dubbing and multilingual voice-over from Synthesys — overlay on their avatar + TTS stack.
Human-in-the-loop AI dubbing pitched at premium media — Sky News, BBC Studios, Bloomberg.
Speechify's dubbing add-on — translate video into 20+ languages keeping the original voice.
Cloud video translation with lip sync, voice cloning, and subtitle export.
InVideo's video-translation surface on top of its template-based video editor.
Video translator and AI dubbing tool with a freemium tier.
Translate, dub, transcribe, and subtitle videos in 75+ languages — single web workflow.
Human + AI translation marketplace from Translated — voice-over and dubbing add-ons.
India-built AI dubbing platform — Indic-language strength.
Captioning + translation services for broadcast and enterprise — Lexi family extension.
Subtitle translation as a managed service from 3Play Media.
Speech feedback + multilingual coaching app — pitched at interview / pitch prep.
Conversational voice agents built on PlayHT's voice stack — real-time + multilingual.
AI voice cloning, multilingual TTS, and real-time voice conversion for media + games.
AI text-to-speech and podcast distribution — 142 languages, voice-cloning API.
AI voice generator + video editor pitched at marketing teams — 100+ languages, 500+ voices.
Multilingual AI voice and dubbing platform — smaller-shop alternative to ElevenLabs.
TTS reader for documents, ebooks, and PDFs — long-running with classroom + accessibility deployments.
Community-trained character voices for parody and fan-fiction — Tortoise + custom models.
Cross-device live conversation translation — share a room code, every device speaks its own language.
Two-language conversation mode in Google Translate — phone-on-the-table interpreter.
Phone-to-phone voice translator — Apple Watch + AirPods integrations.
Voice-first live translation app — Amazon-owned since 2018.
Side-by-side voice translation app with a kid-friendly UI.
Business-travel-focused live translator with cultural-cue overlays.
Enterprise translation-management platform with a multimedia + voice-over add-on.
Phrase TMS / Strings with a multimedia localization module.
Translation-management platform with an AV localization integration layer.
Crowdsourced localization platform with audio + video file support.
Lecture-capture pioneer (Sonic Foundry) used in higher-ed, healthcare, and corporate training.
Lecture-capture, live-streaming, and video CMS with auto-captioning across 200+ universities.
Accessibility + captioning add-on for YuJa's video platform.
Live-streaming platform for houses of worship, schools, and athletics with auto-captioning.
AI + human transcription service marketed to universities for accessibility-grade captions.
Enterprise video platform with transcription used by Fortune 500 L&D and government.
Brightcove's enterprise video cloud configured for university and L&D delivery.
Real-time live-stream infrastructure used under many education-video platforms.
Lecture-capture appliance vendor for universities and government, with caption pipeline.
Live-stream studio used by educators, conferences, and faith organizations.
Async video messaging platform used heavily in corporate training and customer education.
Instructure's video platform inside Canvas LMS with auto-captioning and inline comments.
D2L Brightspace LMS with ReadSpeaker text-to-speech and captioning integration.
Accessibility platform from Anthology that auto-captions and audio-describes course materials.
Open-source LMS with community plugins for ASR captioning and AI transcription.
Open-source higher-ed LMS with caption integrations via Kaltura and partner ASR.
K-12-focused LMS from PowerSchool with caption-friendly media tools.
European K-12 + higher-ed LMS with built-in media and caption support.
Renamed Blackboard Ally, the accessibility + caption layer across Anthology's LMS portfolio.
Caption tools from the Canvas LTI partner ecosystem (3Play, Verbit, Cielo24, AI-Media).
Coursera's in-platform machine + community captions across 50+ languages for MOOC video.
Udemy's auto-caption pipeline for instructor-uploaded course video.
edX Open edX platform with caption tracks attached to every course video.
FutureLearn's MOOC platform with auto + reviewed captions for university partners.
Khan Academy's community + machine caption pipeline across 50+ languages.
Pluralsight's caption + transcript layer on technology training video.
Captions and synchronized transcripts on LinkedIn Learning's professional video library.
Caption tracks across Udacity Nanodegree video content.
Captions + downloadable transcripts on MasterClass celebrity-led courses.
Animated training video tool used by corporate L&D, with caption support on output.
AI avatar video platform widely used for corporate training, with multi-language captioning.
Vimeo Enterprise's caption + transcript layer for corporate video portals.
Knowledge management + training video platform with auto-transcription.
Corporate LMS used for sales-enablement, onboarding, and customer training.
Collaborative LMS focused on internal trainers, with caption support on uploaded video.
Enterprise LMS used by mid-to-large companies, with caption + AI content tools.
User-research repository with auto-transcription used by UX teams.
Transcript-based video editor positioned for UX and qualitative research teams.
Trint's enterprise transcription positioned for research and academic teams.
Rev's transcription + caption services positioned for academic and market research.
Khan Academy's AI tutor that listens to learners and provides voice-based feedback.
Duolingo's premium tier with AI Roleplay voice conversations and Explain My Answer.
AI English-tutor app backed by OpenAI's Startup Fund, focused on speaking practice.
AI English tutor with avatar-based speaking practice across phone and web.
Multi-language AI language tutor with voice conversations.
AI English tutor with phone-call style practice and accent feedback.
Enterprise English-language training platform used by global corporations.
Real-time translation + transcription used by enterprises and educators.
AI pronunciation coach for English learners, with on-device speech scoring.
Verbit's premium captioning + transcription program targeted at universities.
3Play Media's caption services positioned for university accessibility offices.
Live and post-production captioning service for higher-ed and conferences.
Rev.com's education program for universities and K-12 districts.
AI-Media's iCap and LEXI live-caption services positioned for higher-ed.
FutureLearn's corporate-training arm with captioned course content for enterprise buyers.
Enterprise learning platform with captions and synchronized transcripts.
Cornerstone's training-content subscription with captioned video and analytics.
Mid-market LMS with caption support across courses and integrated video tools.
Course-creator platform with interactive video and auto-caption support.
Course-creator platform with caption support on course video.
Course-creator platform with SRT-based captioning workflow.
All-in-one creator platform with caption-supported video hosting.
Enterprise customer-education LMS with captioned video and analytics.
Customer-education LMS used by B2B SaaS, with caption-ready video and SCORM support.
Mid-market corporate LMS with caption-ready video and SCORM authoring.
AI-driven LMS for K-12, higher-ed, and business, with captioned content.
The dominant North American higher-ed LMS, with caption hooks across the partner ecosystem.
Anthology's flagship higher-ed LMS, with Anthology Ally as the caption layer.
Canadian-rooted higher-ed and K-12 LMS with broad caption-vendor integrations.
SMB-focused corporate LMS with captioned video and SCORM/xAPI support.
Microlearning platform with captioned mobile-first lessons.
Frontline corporate training platform with captioned daily microlearning.
Cornerstone Saba enterprise LMS with captioned video and SCORM compliance.
Moodle-derived enterprise LMS used in regulated industries, with caption support.
Learning experience platform with captioned video and integrated analytics.
Learning experience platform aggregating courses with captioned source video.
Cornerstone's LXP with captioned video and AI skill mapping.
Australian MOOC + microcredential platform with captioned course video.
India's national MOOC platform with captioned video across school and college courses.
IIT-led video lecture archive with subtitles across thousands of engineering courses.
Voice-and-language proficiency test using on-device ASR for speaking sections.
AI companion app with real-time voice conversations.
AI lecture-notes assistant that transcribes class audio and generates study guides.
Fathom's free Zoom transcription tool used by educators and student organizations.
Active-learning platform from Minerva University with transcript-driven analytics.
Interactive whiteboard for educators with voice-narrated lessons.
Video-lesson platform with caption support and embedded quizzes.
Interactive K-12 lesson platform with captioned video and audio responses.
Classroom-recording hardware + Reflectivity coaching app with auto-transcription.
Free in-browser real-time captioning powered by your browser's speech recognition.
Built-in Chrome feature that captions any audio playing in the browser, on-device.
Chrome extension that lets you dictate into any text field on any site.
Chrome extension with custom voice commands across any site.
Bookmarkable web page that converts your speech to text in the browser.
Free voice notepad with Google Speech accuracy and a Chrome extension counterpart.
Free in-browser dictation with custom voice commands and 70+ languages.
Browser transcription editor combining auto-transcription with foot-pedal-style controls.
Free open-source browser transcription pad — no upload, runs locally.
Chrome app for taking voice notes via the browser's speech recognizer.
Note-taking app with built-in dictation across web, iOS, and Android.
Free in-browser dictation pad with autosave and quick share.
Chrome extension that speaks back what it transcribed for proofing.
TTS plus dictation companion for the NaturalReader reading platform.
Literacy support toolbar with TTS, dictation, and word prediction across browsers.
Word-prediction and dictation extension for struggling writers.
Accessibility extension with TTS, dictation, screen masking, and color overlays.
Built-in dictation inside Google Docs, accessed under Tools > Voice typing.
Dictation built into Word, Outlook, OneNote in the browser.
Upload audio and get speaker-attributed transcripts inside Word on the web.
System-wide dictation that works in every text field across macOS, iOS, and Safari.
AI writing tool with voice prompts and dictation in the browser.
Grammarly's browser extension pipes dictation into any text field with grammar checks.
In-browser playground for AssemblyAI's Universal speech model.
Browser playground for testing Deepgram Nova STT and Aura TTS.
OpenAI's docs playground with file-upload and translation against the Whisper API.
Hundreds of community-hosted browser demos of speech recognition models.
Hosted browser demo of Whisper running on Modal's serverless GPUs.
In-browser playground for Whisper variants hosted on Replicate.
One-click Whisper endpoints from a browser dashboard on RunPod GPUs.
Browser playground for Sieve's chained-video ASR pipelines.
Whisper running fully in the browser via Transformers.js — no server.
GPU-accelerated Whisper inference in the browser via WebGPU.
Faster on-device Whisper in the browser using a quantized turbo build.
Browser version of Google's accessibility live-transcription app.
Browser overlay that captions any streaming video tab in real time.
No-install browser site that captures tab audio and transcribes it.
Browser-based live captioning paired with the SubtitleBee subtitle generator.
Browser stenography pad for live captioning by trained CART providers.
Chrome extension that turns voice into typed text in any field.
Voice typing and clipboard helpers for productivity in Chrome.
Chrome dictation with offline-style behavior using the browser engine.
Free Chrome speech-to-text with sentence segmentation and punctuation.
Chrome dictation extension with custom voice macros.
Browser audio-to-text service with a free trial transcription credit.
Browser transcription editor with sync-highlight playback.
Free browser transcription service with privacy-friendly pricing.
Pay-per-minute Whisper API wrapper running in the browser.
Community Chrome extensions adding richer captions to Google Meet.
Community Chrome extensions adding live captions to Twitch streams.
Chrome extensions that export YouTube auto-captions to SRT/TXT.
Rev's Chrome extension for capturing audio from Zoom/Meet/Teams calls.
Browser companion for the Scribie human-transcription marketplace.
Browser transcription with built-in collaboration for journalists.
Quick browser transcription with the OpenAI Whisper backbone.
Browser transcription built by Wired's parent for journalists.
Browser video-translation and transcription service with 100+ languages.
Free browser subtitle generator using Whisper.
Browser transcription with Turkish-language strength and team workspace.
Dutch browser transcription and subtitling platform with human-review option.
Browser transcription with human-review pipeline aimed at legal and education.
Browser transcription tool with structured interview templates.
Free browser dictation pad with Markdown shortcuts.
Browser demo of Speechmatics' real-time engine with live mic capture.
In-browser demo of ElevenLabs' Scribe transcription model.
Browser playground for Gladia's multilingual transcription API.
Browser companion for Krisp's noise-cancelling and transcription stack.
GUI for offline transcription on the browser/desktop with diarization.
MDN's reference Web Speech API demo — bookmarkable for SEO research.
Browser demo of Soniox's low-latency streaming speech engine.
Browser playground for Picovoice's on-device wake-word and ASR engines.
Cloud DAW with one-click Vocal Cleanup, Splitter, Mastering, and Vocal Tuner.
Spotify's web DAW with podcast transcription baked into the storyboard editor.
Decentralized audio platform — listed for creator-facing AI tagging/transcription experiments.
Media asset platform with AI face / speech / scene tagging across ingested footage.
AI vocal-processor on iOS / Android + desktop — pitch correction + de-noise for creators.
Speech-to-speech voice replacement used in film/TV ADR — cloud-side with DAW integrations.
AI mastering for podcasts + broadcast — speech-aware loudness profiles.
Pioneering AI mastering service — speech and music profiles for podcast + music releases.
Studio-grade Instant + Professional Voice Cloning with multilingual output (33+ languages).
Ultra-realistic instant + high-fidelity voice cloning with 142+ languages and 800+ stock voices.
One-take personal voice clone built into Speechify Studio.
Enterprise-licensed AI voices for corporate L&D, training, and marketing narration.
Audiobook-grade TTS with audio engineers in the loop.
TTS + podcast pipeline for publishers — articles to audio at scale.
Free browser voice changer with effect-based transformations.
Game voice actors — licensed AI voices with SAG-AFTRA agreement.
Long-running enterprise TTS — accessibility, AAC, signage, IVR.
Long-running embedded TTS — IVR and telephony.
Enterprise TTS for accessibility, e-learning, and document narration.
Nuance (Microsoft) enterprise TTS — IVR and contact centers.
Embedded TTS engine for IVR and consumer products.
AI voice generator for short-form social videos.
MARS-7 multilingual TTS + dubbing platform with low-resource language support.
AI voice generation for game audio and dubbing with speech-to-speech.
Amazon's AI-narrated audiobook tier with self-publishing pipeline.
Apple's AI-narrated audiobook tier for indie publishers.
Audiobook self-publishing pipeline with a 2024 AI-narration tier.
Murf's audiobook-specific workflow with chapter markers and ACX-compatible export.
Lovo's audiobook workflow with multi-character scripts and emotion control.
BeyondWords' audiobook narration tier — TTS plus distribution.
NaturalReader's enterprise tier for ADA / Section 508 web accessibility overlays.
Document-narration plugin for accessible PDFs and Office files.
Assistive-tech TTS reader for dyslexic and visually-impaired users.
Voiceflow's TTS for prototyping voice assistants and chatbots.
AI voice and music generation with a focus on rap and music vocals.
Emotional audiobook narration with licensed voice IP.
Speech-to-speech voice cloning for film, gaming, and broadcast.
Enterprise voice cloning with licensed-talent marketplace.
Neural TTS API — acquired by Apple in 2020, archived as a public product.
Emotional voice synthesis for games — acquired by Spotify in 2022.
Twitch / streamer TTS donation tool with celebrity voice characters.
Amazon's cloud voice assistant, powering Echo devices and Alexa-enabled hardware.
Amazon's LLM-powered upgrade to Alexa with a more conversational ASR + reasoning stack.
Google's voice assistant for Android, Nest speakers, and Wear OS.
Google's conversational Gemini voice mode — successor surface for Assistant on Android.
The voice mode of Microsoft Copilot — Cortana's successor in the Microsoft assistant lineage.
Samsung's voice assistant across Galaxy phones, TVs, and home appliances.
Xiaomi's Mandarin-first voice assistant across Mi phones, speakers, and home IoT.
Huawei's voice assistant for HarmonyOS phones, tablets, and smart-home devices.
OPPO's voice assistant for ColorOS phones in China.
Vivo's voice assistant for FuntouchOS / OriginOS phones in China and India.
Yandex's Russian-language voice assistant for Stantsiya speakers, cars, and phones.
Sberbank's Russian voice-assistant family (Salyut, Joy, Athena) for SberDevices hardware.
MTS's Russian-language voice assistant for Capsule speakers and the MTS Music app.
VK's Russian-language voice assistant for Capsule Mini and the VK Music app.
Alibaba's Mandarin voice assistant powering the Tmall Genie smart-speaker line.
JD.com's Mandarin voice assistant on the Dingdong smart speaker line.
Baidu's Mandarin voice OS powering Xiaodu smart speakers and displays.
Ray-Ban Meta smart glasses with Meta AI voice — capture photos and ask questions hands-free.
On-device live captioning of in-person conversations — Pixel and many Android phones.
Prescription-friendly smart glasses with monocular captioning HUD and live translation.
Hackable open-source smart glasses with mic + display, runs custom captioning apps.
Display-only AR glasses pairing with phones — captioning apps via the XREAL ecosystem.
AR glasses with on-device translation and captioning via the Rokid Station companion.
Standalone Android-based smart glasses with on-board mic, captioning, and translation.
TCL's AR glasses with built-in live translation and captioning HUD.
AI-powered glasses for low-vision users — voice description + captioning of environment.
Affordable hearing assistance earbuds with app-tuned amplification.
Invisible OTC hearing aids tuned by app, with telecare support.
Wireless mic system for hearing aids — pairs with Roger receivers in noisy rooms.
Speech-enhancement earbuds — focus on speech in noisy environments.
Hearing-assist earbuds with adjustable speech focus and ear-tuned DSP.
Pixel Buds capturing audio + Live Transcribe rendering captions in the Pixel Buds app.
"Hey Mercedes" — factory-fitted voice assistant in Mercedes vehicles.
"Hey BMW" — voice assistant in BMW iDrive 7 / 8 / 9 vehicles.
"Hey Audi" — natural-language voice assistant in Audi MMI infotainment.
"Hello IDA" — voice assistant across the VW ID. and Golf families.
Toyota Audio Multimedia voice assistant — "Hey Toyota" wake phrase in newer models.
"Hey Honda" — voice assistant in Honda e:HEV and e:NS models.
Ford's in-vehicle infotainment with natural-language voice control.
Google Assistant native in Chevy / Cadillac / GMC vehicles — no phone needed.
Tesla's in-cabin voice control for navigation, climate, media, and vehicle settings.
Rivian R1T / R1S voice commands — "Hey Rivian" for nav, media, and vehicle features.
Lucid Air voice control — native voice in Lucid's Glass Cockpit + Pilot Panel.
NIO's in-car AI assistant with a physical animated head on the dash.
XPENG's Mandarin voice assistant in Xmart OS — "Hi Xpeng".
Hyundai's connected-car voice + telematics suite with hands-free vehicle control.
Kia's connected-car platform — voice control + remote app + over-the-air services.
Uconnect 5 voice assistant across Jeep, Chrysler, Dodge, Ram, Fiat, Peugeot, Citroën.
Automotive ASR / TTS / dialogue platform — powers most OEM in-car assistants worldwide.
Independent voice-AI platform — Cerence competitor used by Mercedes, Hyundai, and Stellantis.
Privacy-first on-device voice control for Sonos speakers — "Hey Sonos".
Alexa smart display with video calling, recipe view, and on-device voice.
Plug-in Alexa accessory for cars without built-in Alexa.
Echo Dot edition with Amazon Kids+ parental controls and a kid-safe Alexa.
Google's smart display with Assistant / Gemini voice and on-device hot-word.
Push-to-talk voice search on Roku TVs and streaming sticks.
Alexa Voice Remote for Fire TV — push-to-talk and hands-free Fire TV variants.
Bixby voice control built into Samsung Smart TVs and soundbars.
Voice control across LG TVs and ThinQ appliances — "Hi LG".
Voice control for Whirlpool / Maytag connected appliances via Alexa + Google.
Voice control for GE connected appliances via SmartHQ + Alexa / Google.
Voice control for Frigidaire connected appliances via the Frigidaire app + Alexa / Google.
On-device wake-word engine — runs on micro-controllers, mobile, browsers.
On-device voice-activity detector — detect speech vs silence in real time.
Acquired-by-Sonos on-device assistant platform — now lives inside Sonos Voice Control.
Sensory's on-device wake-word + small-vocab ASR — long-standing OEM voice IP.
System-level live captions in Meta Quest 2 / 3 / Pro for VR audio.
Live captions in PICO 4 / Neo 3 headsets — ByteDance's VR accessibility feature.
Frequently asked
faster-whisper vs whisperX — which should I use?
faster-whisper is the speed-optimised runtime. whisperX adds speaker diarization (pyannote) and forced-alignment word timestamps on top. Use faster-whisper if your audio is single-speaker and you only need the transcript. Use whisperX if the content has multiple speakers and you need "who said what."
What's the cheapest transcription API in 2026?
Per-minute pricing (as of 2026-04-20): Deepgram Nova-2 at $0.0043/min is the cheapest streaming API. OpenAI Whisper API is $0.006/min. Self-hosting faster-whisper on a rented GPU is cheaper at scale but requires operational work. Prices shift — check the linked page.
What's the best open-source Otter.ai alternative?
For file-transcription, whisperX (or faster-whisper with pyannote) gives you the same transcript + speaker-label output Otter produces. For the meeting-bot workflow itself, there's no one-click OSS replacement — you'd need to combine Whisper + a bot framework (e.g. meeting-bot libraries) yourself.
Which is best on Apple Silicon (M-series Macs)?
whisper.cpp with the Metal backend is the fastest pure-CLI option. WhisperKit is the Swift-native choice for in-app integration. MacWhisper is the polished desktop app for non-technical users.
I need HIPAA compliance. Which options qualify?
For commercial APIs with HIPAA/BAA paths: Deepgram, AssemblyAI, Rev.ai, and Speechmatics all offer them on appropriate tiers. For self-hosted, HIPAA is your responsibility — the license doesn't grant compliance; your deployment architecture does.
Whisper says it supports 99 languages. Is that real?
The model weights cover 99 languages, but quality varies widely. English, Spanish, German, French, Japanese, and Chinese are excellent. Low-resource languages (e.g. many African and Southeast-Asian languages) are significantly weaker — often below a usable WER. SeamlessM4T is worth checking for those.
Prefer a hosted service over running your own GPU? Whipscribe runs faster-whisper + whisperX behind a web UI, REST API, and MCP server for Claude Desktop.
Try Whipscribe →