VCTK

by Edinburgh CSTR

44h multi-speaker English corpus — 109 speakers across global accents for TTS.

TL;DR

44h multi-speaker English corpus — 109 speakers across global accents for TTS.

Best for multi-speaker TTS, voice cloning, accent modeling. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
DataShare, HuggingFace

What it is

VCTK is 44 hours of English read speech from 109 speakers spanning global English accents. The standard multi-speaker TTS + voice-cloning corpus. License: ODC-By 1.0.

Best for: Multi-speaker TTS, voice cloning, accent modeling.
Watch out for: ODC-By 1.0 · 109 native-English speakers across UK / IE / US / CA / AU / NZ / IN accents · prompted reading. Cite: Yamagishi et al., 2019.

Install / use

from datasets import load_dataset; ds = load_dataset('CSTR-Edinburgh/vctk')

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

VCTK vs Whipscribe

FeatureVCTKWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsDataShare, HuggingFaceWeb, API, MCP

Alternatives to VCTK

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.