Transcription tools directory
Every transcription service we track — open-source engines, desktop apps, APIs, and products — with live GitHub stats, a features matrix, and honest current pricing. Curated by Whipscribe; updated 2026-04-20.
Updated 2026-04-20 · 26 tools trackedThe reference open-source multilingual ASR model from OpenAI.
C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.
4× faster than reference Whisper using CTranslate2 — production sweet spot.
Faster-whisper + forced alignment + speaker diarization in one pipeline.
CLI that transcribes 150 minutes of audio in ~98 seconds on an A100.
Whisper with stabilised timestamps — more accurate word-level timing.
Swift Whisper for Apple Silicon — CoreML, Metal, zero dependencies.
Distilled Whisper: 6× faster, 49% smaller, within 1% WER of the teacher.
Meta's speech-to-text + speech-to-speech + text-to-speech model, 100 languages.
Lightweight offline speech recognition for 20+ languages, runs on a Raspberry Pi.
Cross-platform desktop app for Whisper — open-source MacWhisper alternative.
Polished Mac app for Whisper — the default pick if you're on macOS.
Always-on system-wide dictation for macOS and iOS, powered by local Whisper.
Free Mac App Store Whisper app — drag, drop, done.
Hosted Whisper large-v3 from OpenAI — $0.006 per minute.
Universal-2 model + diarization, PII redaction, topic detection, summarization.
Nova-2 model, excellent streaming, strong at conversational audio.
The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.
Whisper-based API with diarization, 99-language coverage, pay-per-minute.
Enterprise ASR with strong accents and on-prem deployment options.
Meeting-bot transcription product for Zoom/Meet/Teams.
Human + AI transcription, highest accuracy tier on the market.
Audio/video editor that treats the transcript as the timeline — different product category.
Enterprise-focused transcription + collaborative editor for newsrooms.
Meeting-bot transcription + CRM integrations, competitor to Otter.
Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.
Browse by category
Frequently asked
faster-whisper vs whisperX — which should I use?
faster-whisper is the speed-optimised runtime. whisperX adds speaker diarization (pyannote) and forced-alignment word timestamps on top. Use faster-whisper if your audio is single-speaker and you only need the transcript. Use whisperX if the content has multiple speakers and you need "who said what."
What's the cheapest transcription API in 2026?
Per-minute pricing (as of 2026-04-20): Deepgram Nova-2 at $0.0043/min is the cheapest streaming API. OpenAI Whisper API is $0.006/min. Self-hosting faster-whisper on a rented GPU is cheaper at scale but requires operational work. Prices shift — check the linked page.
What's the best open-source Otter.ai alternative?
For file-transcription, whisperX (or faster-whisper with pyannote) gives you the same transcript + speaker-label output Otter produces. For the meeting-bot workflow itself, there's no one-click OSS replacement — you'd need to combine Whisper + a bot framework (e.g. meeting-bot libraries) yourself.
Which is best on Apple Silicon (M-series Macs)?
whisper.cpp with the Metal backend is the fastest pure-CLI option. WhisperKit is the Swift-native choice for in-app integration. MacWhisper is the polished desktop app for non-technical users.
I need HIPAA compliance. Which options qualify?
For commercial APIs with HIPAA/BAA paths: Deepgram, AssemblyAI, Rev.ai, and Speechmatics all offer them on appropriate tiers. For self-hosted, HIPAA is your responsibility — the license doesn't grant compliance; your deployment architecture does.
Whisper says it supports 99 languages. Is that real?
The model weights cover 99 languages, but quality varies widely. English, Spanish, German, French, Japanese, and Chinese are excellent. Low-resource languages (e.g. many African and Southeast-Asian languages) are significantly weaker — often below a usable WER. SeamlessM4T is worth checking for those.
Prefer a hosted service over running your own GPU? Whipscribe runs faster-whisper + whisperX behind a web UI, REST API, and MCP server for Claude Desktop.
Try Whipscribe →