Coqui XTTS-v2

by Coqui (community fork)

Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.

TL;DR

Open-weights multilingual voice cloning from 6 seconds of audio — 17 languages.

Best for developers who want open-weight voice cloning + multilingual TTS they can host on their own GPU. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux, macOS, Windows

What it is

XTTS-v2 was the standout multilingual voice-cloning model from Coqui before the company wound down. Lives on as the most-cited open-weight option for clone-with-6-seconds + multilingual TTS. Listed distinctly from the broader coqui-tts entry already in Lanes C and D.

Best for: Developers who want open-weight voice cloning + multilingual TTS they can host on their own GPU.
Watch out for: Coqui as a company shut down in early 2024 — the codebase lives on community forks. Voice cloning carries the same consent requirement as any cloning model. 17 languages is the published list; performance varies.

Install / use

pip install TTS

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeYes
Languages supported17
HIPAA eligibleNo

Coqui XTTS-v2 vs Whipscribe

FeatureCoqui XTTS-v2Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationYes
Word timestampsYes
StreamingYesNo
Languages1799
PlatformsLinux, macOS, WindowsWeb, API, MCP

Alternatives to Coqui XTTS-v2

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.