Picovoice Cheetah
On-device streaming speech-to-text optimized for embedded and edge.
On-device streaming speech-to-text optimized for embedded and edge.
Best for on-device streaming STT for voice assistants, wearables, hearing aids, kiosks. Pricing: Free tier + per-user/Enterprise plans.
What it is
Picovoice Cheetah is an on-device streaming speech-to-text engine designed to run within tight CPU/RAM budgets, including microcontroller-class hardware. It supports incremental partial transcripts, word timestamps, and a fixed inventory of languages. Picovoice's other relevant products include Leopard (offline batch STT), Porcupine (wake word) and Falcon (speaker diarization). Pricing is access-key-based with a free non-commercial tier and paid enterprise plans. Best fit: on-device streaming stt for voice assistants, wearables, hearing aids, kiosks. Caveats: vocabulary and language coverage smaller than cloud apis; requires picovoice access key. Pricing as listed: Free tier + per-user/Enterprise plans. Feature flags from vendor docs: word-level timestamps, streaming. Directory tags: embedded, on-device. Last vendor-page check: 2026-05-12.
Watch out for: Vocabulary and language coverage smaller than cloud APIs; requires Picovoice access key.
Install / use
pip install pvcheetah # or platform SDK
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 7 |
| HIPAA eligible | No |
Picovoice Cheetah vs Whipscribe
| Feature | Picovoice Cheetah | Whipscribe |
|---|---|---|
| Category | Products | Transcription APIs |
| Pricing | Free tier + per-user/Enterprise plans | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 7 | 99 |
| Platforms | Edge, SDK, iOS, Android, Linux, Windows, macOS | Web, API, MCP |
Alternatives to Picovoice Cheetah
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.