JTubeSpeech

by Saruwatari Lab (U. Tokyo)

Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.

TL;DR

Japanese-speech-from-YouTube corpus — open ASR scaling beyond Reazon.

Best for open Japanese ASR + TTS training from YouTube subtitle alignment. Pricing: research-only.

Category
Open source
License
Stars
Last push
Pricing
research-only
Platforms
GitHub

What it is

JTubeSpeech provides a pipeline + CC-BY filtered URL list to build a Japanese ASR corpus from YouTube subtitles. Researchers download audio themselves. License: pipeline MIT; audio per-video.

Best for: Open Japanese ASR + TTS training from YouTube subtitle alignment.
Watch out for: Pipeline + URL list provided · audio individually under YouTube TOS · CC-BY filter applied. Cite: Takamichi et al., 2021.

Install / use

git clone https://github.com/sarulab-speech/jtubespeech

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

JTubeSpeech vs Whipscribe

FeatureJTubeSpeechWhipscribe
CategoryOpen sourceTranscription APIs
Pricingresearch-onlyfree beta
Speaker diarizationNoYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsGitHubWeb, API, MCP

Alternatives to JTubeSpeech

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.