LJ Speech

by Keith Ito

24h single-speaker English audiobook corpus — the canonical TTS baseline.

TL;DR

24h single-speaker English audiobook corpus — the canonical TTS baseline.

Best for single-speaker neural TTS baselines (Tacotron, FastSpeech, VITS, etc.). Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Web, HuggingFace

What it is

LJ Speech is 24 hours of single-speaker English read speech from public-domain audiobooks. Every modern English TTS paper trains on it. Public domain.

Best for: Single-speaker neural TTS baselines (Tacotron, FastSpeech, VITS, etc.).
Watch out for: Public domain (LibriVox / pre-1923 books) · 24h · single female speaker. Cite: Ito & Johnson, 2017.

Install / use

from datasets import load_dataset; ds = load_dataset('keithito/lj_speech')

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

LJ Speech vs Whipscribe

FeatureLJ SpeechWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsWeb, HuggingFaceWeb, API, MCP

Alternatives to LJ Speech

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.