Moshi
by Kyutai
Kyutai's open speech-to-speech foundation model and demo voice agent.
TL;DR
Kyutai's open speech-to-speech foundation model and demo voice agent.
Best for researchers experimenting with full-duplex speech-to-speech agents. Pricing: free.
Category
Open source
License
—
Stars
—
Last push
—
Pricing
free
Platforms
Linux, macOS, Cloud
What it is
Moshi from Kyutai Labs is an open-weights full-duplex speech-to-speech foundation model: it listens, thinks, and speaks simultaneously rather than turn-taking. The released checkpoints and demo agent are useful for exploring how non-cascaded speech models change voice-AI latency budgets. Apache-2.0 licensed.
Best for: Researchers experimenting with full-duplex speech-to-speech agents.
Watch out for: Research grade; English-led; not a hosted production API.
Watch out for: Research grade; English-led; not a hosted production API.
Features
| Speaker diarization | No |
| Word-level timestamps | No |
| Streaming / real-time | No |
| Languages supported | None |
| HIPAA eligible | No |
Moshi vs Whipscribe
| Feature | Moshi | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | — | Yes |
| Word timestamps | — | Yes |
| Streaming | — | No |
| Languages | — | 99 |
| Platforms | Linux, macOS, Cloud | Web, API, MCP |
Alternatives to Moshi
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.