whisper-diarization

by Mahmoud Ashraf

Whisper + NeMo MSDD diarization pipeline.

TL;DR

Whisper + NeMo MSDD diarization pipeline.

Best for meeting transcripts where 'who said what' matters more than raw speed. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux, macOS, Docker

What it is

An end-to-end recipe combining faster-whisper transcription, demucs vocal separation, and NVIDIA NeMo's MSDD diarizer. Outputs RTTM + SRT with speaker labels. BSD-2-Clause.

Best for: Meeting transcripts where 'who said what' matters more than raw speed.
Watch out for: Heavy dependency tree (NeMo, faster-whisper, demucs); GPU strongly recommended.

Install / use

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

whisper-diarization vs Whipscribe

Featurewhisper-diarizationWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNoNo
Languages9999
PlatformsLinux, macOS, DockerWeb, API, MCP

Alternatives to whisper-diarization

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.