whisperX

by Max Bain

Faster-whisper + forced alignment + speaker diarization in one pipeline.

Category
Open source
License
BSD-2-Clause
Stars
★ 21.4k
Last push
2026-04-04
Pricing
free
Platforms
Linux, macOS, GPU

What it is

whisperX combines faster-whisper with forced alignment (wav2vec2) for word-accurate timestamps and pyannote for speaker diarization. If your audio has more than one speaker and you care about proper "Speaker 1 / Speaker 2" labeling, this is the open-source default. BSD-2 licensed.

Best for: Multi-speaker content (podcasts, interviews, meetings) where "who said what" matters.
Watch out for: Requires a HuggingFace token to download pyannote diarization models (gated); heavier first-run setup.

Install / use

pip install whisperx

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

Links

GitHub repo ↗

Alternatives

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, we're one click away.

Try Whipscribe →