Kencorpus Swahili ASR
Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.
Kencorpus / Maseno — Kenyan Swahili and English code-switch speech dataset and baselines.
Best for swahili and Swahili-English code-switching transcription in East African contexts. Pricing: free.
What it is
Kencorpus is a Kenyan academic-led initiative that published a multilingual Swahili-English-Dholuo speech and text corpus, along with baseline ASR models. The resource is foundational for any East-African transcription project — particularly those that need to handle Swahili-English code-switching, which is the dominant register in Nairobi and other urban markets. Best fit when the buyer is swahili and swahili-english code-switching transcription in east african contexts. The honest caveat: primarily a dataset; productisation is the integrator's responsibility. As with any open-weights release, the integrator owns hosting, scaling, and SLA — but the licensing cost is zero and the model can be fine-tuned on in-house audio.
Watch out for: Primarily a dataset; productisation is the integrator's responsibility.
Install / use
huggingface.co search 'kencorpus' for dataset and model cards
Features
| Speaker diarization | No |
| Word-level timestamps | No |
| Streaming / real-time | No |
| Languages supported | None |
| HIPAA eligible | No |
Kencorpus Swahili ASR vs Whipscribe
| Feature | Kencorpus Swahili ASR | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | No | Yes |
| Word timestamps | No | Yes |
| Streaming | No | No |
| Languages | — | 99 |
| Platforms | Web | Web, API, MCP |
Alternatives to Kencorpus Swahili ASR
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.