Volcengine Speech
ByteDance's Volcano Engine speech-to-text — short, long, and streaming Mandarin ASR.
ByteDance's Volcano Engine speech-to-text — short, long, and streaming Mandarin ASR.
Best for short-form video, livestream, and TikTok-style content workflows that need Mandarin captions at scale. Pricing: tiered · pay-as-you-go in CNY.
What it is
Volcano Engine is ByteDance's enterprise cloud, and its speech product line carries the same engines that auto-caption Douyin and other ByteDance properties. Recognition latency is tuned for video and livestream use cases; the platform also exposes voice cloning and TTS alongside ASR. A pragmatic pick when serving Chinese creators and short-form video producers. Best fit when the buyer is short-form video, livestream, and tiktok-style content workflows that need mandarin captions at scale. The honest caveat: primarily mandarin and cantonese; non-chinese language coverage is narrow. Developer-grade API with self-serve signup; validate accuracy on representative audio for the target dialect before committing volume.
Watch out for: Primarily Mandarin and Cantonese; non-Chinese language coverage is narrow.
Install / use
volcengine.com/product/speech-tech
Features
| Speaker diarization | No |
| Word-level timestamps | No |
| Streaming / real-time | Yes |
| Languages supported | None |
| HIPAA eligible | No |
Volcengine Speech vs Whipscribe
| Feature | Volcengine Speech | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | tiered · pay-as-you-go in CNY | free beta |
| Speaker diarization | — | Yes |
| Word timestamps | — | Yes |
| Streaming | Yes | No |
| Languages | — | 99 |
| Platforms | Web, Android, iOS, Linux | Web, API, MCP |
Alternatives to Volcengine Speech
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.