Vosk

by Alpha Cephei

Lightweight offline speech recognition for 25+ languages, runs on a Raspberry Pi or behind Home Assistant. Per-region cards, copy-and-run recipes, and a model registry below.

TL;DR

Apache-2.0 offline ASR. ≈50 MB models, real-time streaming on a Pi 3/4, plugs into Home Assistant Assist via Wyoming. Strong language coverage — including Hindi, Mandarin, Ukrainian, German, Russian, Japanese, Korean, Turkish — that most cloud STT either skips, charges for, or transcribes badly.

Best for privacy-first home automation, edge hardware, on-device IoT/kiosk UX, and language coverage outside English. Free.

What it is

Vosk is a practical, production-friendly offline recognizer built on Kaldi. Predates the Whisper era but still the go-to for true streaming on constrained hardware, IoT, kiosks, and embedded devices. Apache-2.0 licensed.

Best for: Real-time streaming transcription on-device or on edge hardware with limited resources.
Watch out for: WER trails Whisper on most languages; older Kaldi-based architecture; smaller community now.

Install / use

pip install vosk

Python 3.5–3.12. Audio must be PCM 16 kHz 16-bit mono — use ffmpeg to convert anything else. Pi Zero / ARMv6 are not supported; Pi 3/4 are.

Where Vosk shines · 12 regions, 1 toolkit

Vosk's adoption isn't uniform — it's strongest where one or more of {privacy regulation, weak cloud-STT language coverage, maker-scene density, infrastructure constraints} dominate the build calculus. Pick the card closest to your context — recommended model is on each.

🇩🇪

Germany

Privacy-first home automation

GDPR-shaped instincts plus a "no Alexa in my house" community. Home Assistant + Rhasspy / Wyoming setups are huge in DE, and Vosk is one of the standard STT backends.

vosk-model-small-de-0.15
45 MB · WER 13.75 · Pi-ok

🇺🇦

Ukraine

Language + infra independence

Most cloud STT skips Ukrainian, charges for it, or does it badly. Wartime makes "no foreign cloud, no foreign payments" the default — Vosk runs on a Pi, no account.

vosk-model-small-uk-v3-nano
73 MB · Pi-ok · v3 series

🇮🇳

India

Indian English + Hindi on edge

Cloud STT charges per-minute and Indic coverage is uneven. Vosk has a dedicated Indian-English acoustic model and small Hindi/Telugu/Tamil/Gujarati models that fit on cheap ARM boards.

vosk-model-en-in-0.5 · vosk-model-small-hi-0.22
~50 MB each · Pi-ok

🇨🇳

China

Mandarin + offline by default

Foreign cloud STT is intermittently reachable. Vosk's Mandarin small model gives a no-network fallback for kiosks, in-vehicle UX, and IoT firmware shipping into mainland markets.

vosk-model-small-cn-0.22
43 MB · Pi-ok

🇧🇷

Brazil

Portuguese cost-floor

Cloud STT priced in USD bites hard at BRL revenue. Vosk's Portuguese models give per-minute cost = $0 once the device is bought, which makes consumer-app voice features economically viable.

vosk-model-small-pt-0.3
31 MB · Pi-ok

🇯🇵

Japan

Embedded UX + privacy

Strong embedded / robotics culture and a deep aversion to cloud-leaking household audio. Vosk's small Japanese model + streaming API is well-suited to robotics SDKs and consumer-electronics firmware.

vosk-model-small-ja-0.22
48 MB · Pi-ok

🇫🇷

France

Cloud sovereignty

Public-sector and regulated-industry buyers need French-soil or on-premise STT — "souveraineté numérique" is policy. Vosk runs on the customer's own hardware with no telemetry.

vosk-model-small-fr-0.22
41 MB · Pi-ok

🇰🇷

Korea

On-device assistive UX

Korea's hardware-first culture treats latency and offline-capability as table stakes. Vosk's streaming API gives sub-200 ms partial transcripts on a Cortex-A53.

vosk-model-small-ko-0.22
82 MB · Pi-ok

🇪🇸

Spain & LatAm

Spanish, low-latency, free

Spanish-language SaaS is price-sensitive; Vosk's free Spanish model means voice features ship without per-minute STT cost. Same calculus across LatAm.

vosk-model-small-es-0.42
39 MB · Pi-ok

🇮🇹

Italy

Manufacturing IoT + factory floor

Italian SMB manufacturing puts voice control on machines that often have no internet uplink. Vosk on industrial Linux boxes is the path of least resistance.

vosk-model-small-it-0.22
48 MB · Pi-ok

🇹🇷

Türkiye

Turkish + price-floor

Lira-priced products + USD-priced cloud STT = unworkable. Vosk's Turkish small model is one of very few free options that handle vowel harmony and agglutinative morphology decently.

vosk-model-small-tr-0.3
35 MB · Pi-ok

🇳🇱

Netherlands

Dutch + maker community

Strong Home Assistant / DIY voice-control community in NL. Dutch is well-covered by Vosk; pairs cleanly with Piper TTS for an end-to-end offline voice loop.

vosk-model-small-nl-0.22
42 MB · Pi-ok

Pattern: if you're optimising for privacy, language coverage outside English, infrastructure independence, or per-minute cost, Vosk is on the short list. If you only need lowest WER on English / French / Spanish at any cost, Whisper or faster-whisper may serve better — see the comparison further down.

Setup recipes · pick one and copy

Three working configurations covering the most common Vosk deployments. Each block is copy-and-run — no hidden steps.

1Raspberry Pi quickstart

Pi 3 or 4 · any of the 25+ Vosk languages · pure pip + ffmpeg.

# system deps (Raspberry Pi OS Bookworm/Bullseye)
sudo apt update && sudo apt install -y \
  python3-pip python3-pyaudio ffmpeg unzip

# install Vosk
pip3 install vosk sounddevice

# small model — German shown; swap the URL for any
# language at alphacephei.com/vosk/models
wget https://alphacephei.com/vosk/models/vosk-model-small-de-0.15.zip
unzip vosk-model-small-de-0.15.zip

# transcribe a file
vosk-transcriber -m vosk-model-small-de-0.15 \
  -i sample.wav -o sample.txt

# or stream from the default mic
python3 vosk-api/python/example/test_microphone.py \
  -m vosk-model-small-de-0.15

For another language, swap the model URL — full registry at alphacephei.com/vosk/models ↗

2Home Assistant via Wyoming

Replace HA's default cloud STT with offline Vosk. wyoming-vosk is the protocol bridge.

# run wyoming-vosk in Docker (en + de + uk loaded;
# add --model-for-language flags for more)
mkdir -p ~/wyoming-vosk-data
docker run -d --name wyoming-vosk \
  --restart unless-stopped \
  -p 10300:10300 \
  -v ~/wyoming-vosk-data:/data \
  rhasspy/wyoming-vosk:latest \
  --uri tcp://0.0.0.0:10300 \
  --data-dir /data \
  --model-for-language en vosk-model-small-en-us-0.15 \
  --model-for-language de vosk-model-small-de-0.15 \
  --model-for-language uk vosk-model-small-uk-v3-nano

# in Home Assistant:
#  Settings → Devices & Services → Add Integration
#  → Wyoming Protocol
#  Host: <host-ip>   Port: 10300

# then wire it into Assist:
#  Settings → Voice assistants → new pipeline
#  Speech-to-text: Vosk  ·  Text-to-speech: Piper

Source: rhasspy/wyoming-vosk ↗. Pair with wyoming-piper for offline TTS.

3Real-time mic streaming

Python · partial transcripts at zero latency · drop-in for kiosks & live captions.

import json, sounddevice as sd, queue
from vosk import Model, KaldiRecognizer

q = queue.Queue()
def cb(indata, frames, time, status):
    q.put(bytes(indata))

# any unzipped model dir
model = Model("vosk-model-small-de-0.15")
rec   = KaldiRecognizer(model, 16000)

with sd.RawInputStream(samplerate=16000,
                       blocksize=8000,
                       dtype="int16", channels=1,
                       callback=cb):
    print("Listening — Ctrl-C to quit")
    while True:
        data = q.get()
        if rec.AcceptWaveform(data):
            print(json.loads(rec.Result())["text"])
        else:
            p = json.loads(rec.PartialResult()).get("partial")
            if p: print(f"… {p}", end="\r")

Streaming API yields word-by-word partials. Swap the model dir to switch language.

Features

Speaker diarization	Yes
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	20
HIPAA eligible	No

Vosk vs Whipscribe

Feature	Vosk	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	Yes	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	20	99
Platforms	Linux, macOS, Windows, iOS, Android, Edge	Web, API, MCP

Alternatives to Vosk

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.