WhisperFusion

by Collabora

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

TL;DR

Ultra-low-latency speech→LLM pipeline: WhisperLive + Mistral + TensorRT.

Best for voice-agent prototypes that need 500ms end-to-end speech→reply on a single GPU. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux, Docker

What it is

Combines WhisperLive (ASR), Mistral-7B (LLM), and SileroVAD into one Triton-served pipeline. Demonstrates sub-second turn-taking on an RTX 4090. MIT.

Best for: Voice-agent prototypes that need 500ms end-to-end speech→reply on a single GPU.
Watch out for: Requires NVIDIA GPU + TensorRT-LLM; English-tuned; heavy Docker image.

Install / use

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeYes
Languages supported99
HIPAA eligibleNo

WhisperFusion vs Whipscribe

FeatureWhisperFusionWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingYesNo
Languages9999
PlatformsLinux, DockerWeb, API, MCP

Alternatives to WhisperFusion

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.