Coqui XTTS
Open-source multilingual TTS with zero-shot voice cloning.
Open-source multilingual TTS with zero-shot voice cloning.
Best for researchers and self-hosters wanting zero-shot multilingual voice cloning under an inspectable license. Pricing: free (CPML license, non-commercial without separate license).
What it is
XTTS is Coqui's flagship cross-lingual zero-shot TTS — a 6-second reference clip is enough to clone a voice and synthesize speech across 17 languages. The library is widely used despite Coqui Inc. shutting down; community forks (idiap/coqui-ai-TTS) keep maintenance going. Consent posture: open weights allow user to clone any voice, so consent enforcement is the operator's responsibility — vendor surface is not a gating layer.
Watch out for: Coqui Public Model License is non-commercial by default — commercial use requires a separate license from Coqui (now via community / forks since the company shut down).
Install / use
pip install TTS
Features
| Speaker diarization | No |
| Word-level timestamps | No |
| Streaming / real-time | No |
| Languages supported | 17 |
| HIPAA eligible | No |
Coqui XTTS vs Whipscribe
| Feature | Coqui XTTS | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free (CPML license, non-commercial without separate license) | free beta |
| Speaker diarization | — | Yes |
| Word timestamps | — | Yes |
| Streaming | — | No |
| Languages | 17 | 99 |
| Platforms | macOS, Windows, Linux | Web, API, MCP |
Alternatives to Coqui XTTS
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.