llama.cpp

by ggml.ai

GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).

TL;DR

GGUF runtime — runs many ASR forks (whisper, parakeet, qwen-audio).

Best for reference quantization stack used by many ASR forks. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux, macOS, Windows, Android, iOS

What it is

The reference C++ inference engine for GGUF models. MIT.

Best for: Reference quantization stack used by many ASR forks.
Watch out for: Not an ASR engine on its own.

Install / use

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

llama.cpp vs Whipscribe

Featurellama.cppWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages9999
PlatformsLinux, macOS, Windows, Android, iOSWeb, API, MCP

Alternatives to llama.cpp

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.