Microsoft SpeechT5

by Microsoft

Unified speech-text Transformer (ASR + TTS + VC).

TL;DR

Unified speech-text Transformer (ASR + TTS + VC).

Best for researchers exploring unified speech-text pretraining. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux

What it is

SpeechT5 unifies ASR, TTS, and voice conversion. MIT.

Best for: Researchers exploring unified speech-text pretraining.
Watch out for: English-only; small model zoo.

Install / use

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

Microsoft SpeechT5 vs Whipscribe

FeatureMicrosoft SpeechT5Whipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages199
PlatformsLinuxWeb, API, MCP

Alternatives to Microsoft SpeechT5

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.