whisperX
by Max Bain
Faster-whisper + forced alignment + speaker diarization in one pipeline.
Category
Open source
License
BSD-2-Clause
Stars
★ 21.4k
Last push
2026-04-04
Pricing
free
Platforms
Linux, macOS, GPU
What it is
whisperX combines faster-whisper with forced alignment (wav2vec2) for word-accurate timestamps and pyannote for speaker diarization. If your audio has more than one speaker and you care about proper "Speaker 1 / Speaker 2" labeling, this is the open-source default. BSD-2 licensed.
Best for: Multi-speaker content (podcasts, interviews, meetings) where "who said what" matters.
Watch out for: Requires a HuggingFace token to download pyannote diarization models (gated); heavier first-run setup.
Watch out for: Requires a HuggingFace token to download pyannote diarization models (gated); heavier first-run setup.
Install / use
pip install whisperx
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 99 |
| HIPAA eligible | No |
Links
Alternatives
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, we're one click away.
Try Whipscribe →