Tortoise TTS
by neonbjb
Open-source neural TTS with strong prosody and voice cloning.
TL;DR
Open-source neural TTS with strong prosody and voice cloning.
Best for researchers and self-hosters who care more about prosody quality than inference speed. Pricing: free (Apache-2.0).
Category
Open source
License
—
Stars
—
Last push
—
Pricing
free (Apache-2.0)
Platforms
Linux, macOS, Windows
What it is
Tortoise TTS is a single-author Apache-2.0 voice model known for high-quality prosody and voice cloning. Generation is significantly slower than VITS/XTTS — minutes per sample on CPU. Consent posture: open weights — operator owns consent enforcement.
Best for: Researchers and self-hosters who care more about prosody quality than inference speed.
Watch out for: Slow inference even on GPU; English-only; cloning requires 3–5 reference clips.
Watch out for: Slow inference even on GPU; English-only; cloning requires 3–5 reference clips.
Install / use
pip install tortoise-tts
Features
| Speaker diarization | No |
| Word-level timestamps | No |
| Streaming / real-time | No |
| Languages supported | 1 |
| HIPAA eligible | No |
Tortoise TTS vs Whipscribe
| Feature | Tortoise TTS | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free (Apache-2.0) | free beta |
| Speaker diarization | — | Yes |
| Word timestamps | — | Yes |
| Streaming | — | No |
| Languages | 1 | 99 |
| Platforms | Linux, macOS, Windows | Web, API, MCP |
Alternatives to Tortoise TTS
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.