Coqui XTTS

by Coqui

Open-source multilingual TTS with zero-shot voice cloning.

TL;DR

Open-source multilingual TTS with zero-shot voice cloning.

Best for researchers and self-hosters wanting zero-shot multilingual voice cloning under an inspectable license. Pricing: free (CPML license, non-commercial without separate license).

Category
Open source
License
Stars
Last push
Pricing
free (CPML license, non-commercial without separate license)
Platforms
macOS, Windows, Linux

What it is

XTTS is Coqui's flagship cross-lingual zero-shot TTS — a 6-second reference clip is enough to clone a voice and synthesize speech across 17 languages. The library is widely used despite Coqui Inc. shutting down; community forks (idiap/coqui-ai-TTS) keep maintenance going. Consent posture: open weights allow user to clone any voice, so consent enforcement is the operator's responsibility — vendor surface is not a gating layer.

Best for: Researchers and self-hosters wanting zero-shot multilingual voice cloning under an inspectable license.
Watch out for: Coqui Public Model License is non-commercial by default — commercial use requires a separate license from Coqui (now via community / forks since the company shut down).

Install / use

pip install TTS

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported17
HIPAA eligibleNo

Coqui XTTS vs Whipscribe

FeatureCoqui XTTSWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfree (CPML license, non-commercial without separate license)free beta
Speaker diarizationYes
Word timestampsYes
StreamingNo
Languages1799
PlatformsmacOS, Windows, LinuxWeb, API, MCP

Alternatives to Coqui XTTS

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.