The reference open-source multilingual ASR model from OpenAI.
The complete transcription directory
Every transcription tool worth knowing, evaluated on the capabilities that matter — diarization, pricing, languages, API access, self-hosting. Updated 2026-04-24.
26 tools · 11 open source · 7 api · 3 desktop · 5 product| Name | Category | Pricing | Diarization | Word timestamps | Streaming | Languages | API | Self-host | |
|---|---|---|---|---|---|---|---|---|---|
| OpenAI Whisper OpenAI | Open source | free | 99 | View | |||||
| whisper.cpp Georgi Gerganov | Open source | free | 99 | View | |||||
| faster-whisper SYSTRAN | Open source | free | 99 | View | |||||
| whisperX Max Bain | Open source | free | 99 | View | |||||
| insanely-fast-whisper Vaibhav Srivastav | Open source | free | 99 | View | |||||
| stable-ts jianfch | Open source | free | 99 | View | |||||
| WhisperKit Argmax | Open source | free | 99 | View | |||||
| distil-whisper Hugging Face | Open source | free | 1 | View | |||||
| SeamlessM4T Meta AI | Open source | free | 100 | View | |||||
| Vosk Alpha Cephei | Open source | free | 20 | View | |||||
| Buzz Chidi Williams | Open source | free | 99 | View | |||||
| MacWhisper Jordi Bruin | Desktop | freemium | 99 | View | |||||
| SuperWhisper Sindre Sorhus | Desktop | freemium | 99 | View | |||||
| Aiko Sindre Sorhus | Desktop | free | 99 | View | |||||
| OpenAI Whisper API OpenAI | API | $0.006/min | 99 | View | |||||
| AssemblyAI AssemblyAI | API | from $0.37/hr | 99 | View | |||||
| Deepgram Deepgram | API | from $0.0043/min | 36 | View | |||||
| Rev.ai Rev.ai | API | from $0.02/min | 36 | View | |||||
| Gladia Gladia | API | from $0.0102/min | 99 | View | |||||
| Speechmatics Speechmatics | API | contact sales | 50 | View | |||||
| Otter.ai Otter.ai | Product | free / from $10/mo | 3 | View | |||||
| Rev Rev | Product | AI from $0.25/min · human from $1.50/min | 36 | View | |||||
| Descript Descript | Product | free / from $12/mo | 22 | View | |||||
| Trint Trint | Product | from $60/mo | 30 | View | |||||
| Fireflies.ai Fireflies.ai | Product | free / from $10/mo | 30 | View | |||||
| WhipscribeThat's us Neugence | API | free beta | 99 | View |
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
4× faster than reference Whisper using CTranslate2 — production sweet spot.
Faster-whisper + forced alignment + speaker diarization in one pipeline.
CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.
Whisper with stabilised timestamps — more accurate word-level timing.
Swift Whisper for Apple Silicon — CoreML, Metal, zero dependencies.
Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.
Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.
Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.
Cross-platform desktop app for Whisper — open-source MacWhisper alternative.
Polished Mac app for Whisper — the default pick if you're on macOS.
Always-on system-wide dictation for macOS and iOS, powered by local Whisper.
Free Mac App Store Whisper app — drag, drop, done.
Hosted Whisper large-v3 from OpenAI — $0.006 per minute.
Universal-2 model + diarization, PII redaction, topic detection, summarization.
Nova-2 model, excellent streaming, strong at conversational audio.
The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.
Whisper-based API with diarization, 99-language coverage, pay-per-minute.
Enterprise ASR with strong accents and on-prem deployment options.
Meeting-bot transcription product for Zoom/Meet/Teams.
Human + AI transcription, highest accuracy tier on the market.
Audio/video editor that treats the transcript as the timeline — different product category.
Enterprise-focused transcription + collaborative editor for newsrooms.
Meeting-bot transcription + CRM integrations, competitor to Otter.
Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.
Pricing distribution across the directory
Quick glossary
- Supported · Not supported · Unknown
- Source of truth: vendor docs, evidence log updated 2026-04-24.
- Diarization
- Speaker-labelled output (Speaker 1 / Speaker 2) — essential for interviews, meetings, podcasts.
- Word timestamps
- Per-word start/end times — required for subtitles, karaoke-style UIs, precise search.
- Streaming
- Live transcription over WebSocket / microphone — for voice agents and meeting bots.
- Self-host
- Runs entirely on your hardware / VPC — required for air-gapped, on-prem, or data-sovereignty workloads.
- HIPAA
- Vendor offers a BAA / HIPAA-eligible tier. Self-hosted engines leave compliance to the deployer.