Vosk
Lightweight offline speech recognition for 25+ languages, runs on a Raspberry Pi or behind Home Assistant. Per-region cards, copy-and-run recipes, and a model registry below.
Apache-2.0 offline ASR. ≈50 MB models, real-time streaming on a Pi 3/4, plugs into Home Assistant Assist via Wyoming. Strong language coverage — including Hindi, Mandarin, Ukrainian, German, Russian, Japanese, Korean, Turkish — that most cloud STT either skips, charges for, or transcribes badly.
Best for privacy-first home automation, edge hardware, on-device IoT/kiosk UX, and language coverage outside English. Free.
What it is
Vosk is a practical, production-friendly offline recognizer built on Kaldi. Predates the Whisper era but still the go-to for true streaming on constrained hardware, IoT, kiosks, and embedded devices. Apache-2.0 licensed.
Watch out for: WER trails Whisper on most languages; older Kaldi-based architecture; smaller community now.
Install / use
pip install vosk
Python 3.5–3.12. Audio must be PCM 16 kHz 16-bit mono — use ffmpeg to convert anything else. Pi Zero / ARMv6 are not supported; Pi 3/4 are.
Where Vosk shines · 12 regions, 1 toolkit
Vosk's adoption isn't uniform — it's strongest where one or more of {privacy regulation, weak cloud-STT language coverage, maker-scene density, infrastructure constraints} dominate the build calculus. Pick the card closest to your context — recommended model is on each.
GDPR-shaped instincts plus a "no Alexa in my house" community. Home Assistant + Rhasspy / Wyoming setups are huge in DE, and Vosk is one of the standard STT backends.
45 MB · WER 13.75 · Pi-ok
Most cloud STT skips Ukrainian, charges for it, or does it badly. Wartime makes "no foreign cloud, no foreign payments" the default — Vosk runs on a Pi, no account.
73 MB · Pi-ok · v3 series
Cloud STT charges per-minute and Indic coverage is uneven. Vosk has a dedicated Indian-English acoustic model and small Hindi/Telugu/Tamil/Gujarati models that fit on cheap ARM boards.
~50 MB each · Pi-ok
Foreign cloud STT is intermittently reachable. Vosk's Mandarin small model gives a no-network fallback for kiosks, in-vehicle UX, and IoT firmware shipping into mainland markets.
43 MB · Pi-ok
Cloud STT priced in USD bites hard at BRL revenue. Vosk's Portuguese models give per-minute cost = $0 once the device is bought, which makes consumer-app voice features economically viable.
31 MB · Pi-ok
Strong embedded / robotics culture and a deep aversion to cloud-leaking household audio. Vosk's small Japanese model + streaming API is well-suited to robotics SDKs and consumer-electronics firmware.
48 MB · Pi-ok
Public-sector and regulated-industry buyers need French-soil or on-premise STT — "souveraineté numérique" is policy. Vosk runs on the customer's own hardware with no telemetry.
41 MB · Pi-ok
Korea's hardware-first culture treats latency and offline-capability as table stakes. Vosk's streaming API gives sub-200 ms partial transcripts on a Cortex-A53.
82 MB · Pi-ok
Spanish-language SaaS is price-sensitive; Vosk's free Spanish model means voice features ship without per-minute STT cost. Same calculus across LatAm.
39 MB · Pi-ok
Italian SMB manufacturing puts voice control on machines that often have no internet uplink. Vosk on industrial Linux boxes is the path of least resistance.
48 MB · Pi-ok
Lira-priced products + USD-priced cloud STT = unworkable. Vosk's Turkish small model is one of very few free options that handle vowel harmony and agglutinative morphology decently.
35 MB · Pi-ok
Strong Home Assistant / DIY voice-control community in NL. Dutch is well-covered by Vosk; pairs cleanly with Piper TTS for an end-to-end offline voice loop.
42 MB · Pi-ok
Setup recipes · pick one and copy
Three working configurations covering the most common Vosk deployments. Each block is copy-and-run — no hidden steps.
Pi 3 or 4 · any of the 25+ Vosk languages · pure pip + ffmpeg.
# system deps (Raspberry Pi OS Bookworm/Bullseye)
sudo apt update && sudo apt install -y \
python3-pip python3-pyaudio ffmpeg unzip
# install Vosk
pip3 install vosk sounddevice
# small model — German shown; swap the URL for any
# language at alphacephei.com/vosk/models
wget https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
unzip vosk-model-small-de-0.15.zip
# transcribe a file
vosk-transcriber -m vosk-model-small-de-0.15 \
-i sample.wav -o sample.txt
# or stream from the default mic
python3 vosk-api/python/example/test_microphone.py \
-m vosk-model-small-de-0.15
Replace HA's default cloud STT with offline Vosk. wyoming-vosk is the protocol bridge.
# run wyoming-vosk in Docker (en + de + uk loaded;
# add --model-for-language flags for more)
mkdir -p ~/wyoming-vosk-data
docker run -d --name wyoming-vosk \
--restart unless-stopped \
-p 10300:10300 \
-v ~/wyoming-vosk-data:/data \
rhasspy/wyoming-vosk:latest \
--uri tcp://0.0.0.0:10300 \
--data-dir /data \
--model-for-language en vosk-model-small-en-us-0.15 \
--model-for-language de vosk-model-small-de-0.15 \
--model-for-language uk vosk-model-small-uk-v3-nano
# in Home Assistant:
# Settings → Devices & Services → Add Integration
# → Wyoming Protocol
# Host: <host-ip> Port: 10300
# then wire it into Assist:
# Settings → Voice assistants → new pipeline
# Speech-to-text: Vosk · Text-to-speech: Piper
Python · partial transcripts at zero latency · drop-in for kiosks & live captions.
import json, sounddevice as sd, queue
from vosk import Model, KaldiRecognizer
q = queue.Queue()
def cb(indata, frames, time, status):
q.put(bytes(indata))
# any unzipped model dir
model = Model("vosk-model-small-de-0.15")
rec = KaldiRecognizer(model, 16000)
with sd.RawInputStream(samplerate=16000,
blocksize=8000,
dtype="int16", channels=1,
callback=cb):
print("Listening — Ctrl-C to quit")
while True:
data = q.get()
if rec.AcceptWaveform(data):
print(json.loads(rec.Result())["text"])
else:
p = json.loads(rec.PartialResult()).get("partial")
if p: print(f"… {p}", end="\r")
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 20 |
| HIPAA eligible | No |
Links
- alphacep/vosk-api ↗ — main repo (Python / Node / Java / C# / Go / Rust)
- alphacephei.com/vosk/models ↗ — full model registry (all 25+ languages, sizes + WER)
- rhasspy/wyoming-vosk ↗ — Wyoming protocol server for Home Assistant Assist
- alphacep/vosk-android-demo ↗ — Android sample app
- egorsmkv/speech-recognition-uk ↗ — community-maintained Ukrainian models
- uhh-lt/vosk-model-tuda-de ↗ — alternative German wideband model
- Home Assistant local Assist setup ↗ — pipeline that consumes
wyoming-vosk
Vosk vs Whipscribe
| Feature | Vosk | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 20 | 99 |
| Platforms | Linux, macOS, Windows, iOS, Android, Edge | Web, API, MCP |
Alternatives to Vosk
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.