Looking at whisper-diarization? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

whisper-diarization

Name: whisper-diarization
Author: Mahmoud Ashraf

by Mahmoud Ashraf

Whisper + NeMo MSDD diarization pipeline.

TL;DR

Whisper + NeMo MSDD diarization pipeline.

Best for meeting transcripts where 'who said what' matters more than raw speed. Pricing: free.

What it is

An end-to-end recipe combining faster-whisper transcription, demucs vocal separation, and NVIDIA NeMo's MSDD diarizer. Outputs RTTM + SRT with speaker labels. BSD-2-Clause.

Best for: Meeting transcripts where 'who said what' matters more than raw speed.
Watch out for: Heavy dependency tree (NeMo, faster-whisper, demucs); GPU strongly recommended.

Install / use

git clone https://github.com/MahmoudAshraf97/whisper-diarization

Features

Speaker diarization	Yes
Word-level timestamps	Yes
Streaming / real-time	No
Languages supported	99
HIPAA eligible	No

whisper-diarization vs Whipscribe

Feature	whisper-diarization	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	Yes	Yes
Word timestamps	Yes	Yes
Streaming	No	No
Languages	99	99
Platforms	Linux, macOS, Docker	Web, API, MCP

Alternatives to whisper-diarization

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.