Looking at WhisperFusion? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

WhisperFusion

Name: WhisperFusion
Author: Collabora

by Collabora

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

TL;DR

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

Best for voice-agent prototypes that need 500ms end-to-end speech→reply on a single GPU. Pricing: free.

What it is

Combines WhisperLive (ASR), Mistral-7B (LLM), and SileroVAD into one Triton-served pipeline. Demonstrates sub-second turn-taking on an RTX 4090. MIT.

Best for: Voice-agent prototypes that need 500ms end-to-end speech→reply on a single GPU.
Watch out for: Requires NVIDIA GPU + TensorRT-LLM; English-tuned; heavy Docker image.

Install / use

git clone https://github.com/collabora/WhisperFusion

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	99
HIPAA eligible	No

WhisperFusion vs Whipscribe

Feature	WhisperFusion	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	99	99
Platforms	Linux, Docker	Web, API, MCP

Alternatives to WhisperFusion

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.