Volcengine Speech (ByteDance)

by Volcengine (ByteDance)

ByteDance's Volcengine speech-to-text platform powering Douyin/CapCut workflows.

TL;DR

ByteDance's Volcengine speech-to-text platform powering Douyin/CapCut workflows.

Best for short-form video pipelines (Douyin/TikTok-style) needing fast Mandarin captioning + voice intel. Pricing: tiered per-second (RMB).

Category
Transcription APIs
License
Stars
Last push
Pricing
tiered per-second (RMB)
Platforms
API

What it is

Volcengine is ByteDance's enterprise cloud platform. Its speech suite covers real-time streaming ASR, batch long-audio recognition, voice cloning, and TTS. ByteDance uses these internally for Douyin (China TikTok), CapCut captioning, and Lark's meeting transcription. The Mandarin model is among the strongest in the China market; English and several other languages are available in specific endpoints. International access is via the Volcengine International console with separate pricing. Best fit: short-form video pipelines (douyin/tiktok-style) needing fast mandarin captioning + voice intel. Caveats: china-first; international tenants must use the volcengine international console; some endpoints are gated to enterprise contracts. Pricing as listed: tiered per-second (RMB). Feature flags from vendor docs: speaker diarization, word-level timestamps, streaming. Directory tags: commercial-api, regional-asia. Last vendor-page check: 2026-05-12.

Best for: Short-form video pipelines (Douyin/TikTok-style) needing fast Mandarin captioning + voice intel.
Watch out for: China-first; international tenants must use the Volcengine International console; some endpoints are gated to enterprise contracts.

Install / use

Volcengine SDK: AsrClient.submit_task(...)

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeYes
Languages supportedNone
HIPAA eligibleNo

Volcengine Speech (ByteDance) vs Whipscribe

FeatureVolcengine Speech (ByteDance)Whipscribe
CategoryTranscription APIsTranscription APIs
Pricingtiered per-second (RMB)free beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingYesNo
Languages99
PlatformsAPIWeb, API, MCP

Alternatives to Volcengine Speech (ByteDance)

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.