Picovoice Leopard

by Picovoice

On-device offline speech-to-text from Picovoice — file-based, no cloud.

TL;DR

On-device offline speech-to-text from Picovoice — file-based, no cloud.

Best for offline batch transcription with diarization on devices that can't ship audio to the cloud. Pricing: Free tier + per-user/Enterprise plans.

Category
Products
License
Stars
Last push
Pricing
Free tier + per-user/Enterprise plans
Platforms
Edge, SDK, Linux, Windows, macOS

What it is

Picovoice Leopard is the offline batch counterpart to Cheetah, taking complete audio files and producing transcripts with word timestamps and speaker diarization, all running on-device. It targets compliance-sensitive customers (healthcare, legal) who want a local-only option. Same access-key licensing as the rest of the Picovoice stack. Best fit: offline batch transcription with diarization on devices that can't ship audio to the cloud. Caveats: smaller language coverage than whisper; requires picovoice access key. Pricing as listed: Free tier + per-user/Enterprise plans. Feature flags from vendor docs: speaker diarization, word-level timestamps. Directory tags: embedded, on-device. Last vendor-page check: 2026-05-12.

Best for: Offline batch transcription with diarization on devices that can't ship audio to the cloud.
Watch out for: Smaller language coverage than Whisper; requires Picovoice access key.

Install / use

pip install pvleopard

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported7
HIPAA eligibleNo

Picovoice Leopard vs Whipscribe

FeaturePicovoice LeopardWhipscribe
CategoryProductsTranscription APIs
PricingFree tier + per-user/Enterprise plansfree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNoNo
Languages799
PlatformsEdge, SDK, Linux, Windows, macOSWeb, API, MCP

Alternatives to Picovoice Leopard

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.