Alibaba Cloud Intelligent Speech Interaction

by Alibaba Cloud

Alibaba's managed Chinese-first ASR with batch + real-time and customizable hotwords.

TL;DR

Alibaba's managed Chinese-first ASR with batch + real-time and customizable hotwords.

Best for china-region products that need first-class Mandarin and major Chinese-dialect support. Pricing: tiered RMB-per-second pricing.

Category
Transcription APIs
License
Stars
Last push
Pricing
tiered RMB-per-second pricing
Platforms
API

What it is

Alibaba Cloud Intelligent Speech Interaction (ISI) is Alibaba's family of speech services covering real-time streaming ASR, batch file recognition, real-time captioning, hotword customization, and a separate model-customization workflow. The service is the strongest mainstream pick for Mandarin and major Chinese dialects (Cantonese, Shanghainese, Sichuanese), and is most commonly used inside Alibaba Cloud's China regions. International account access exists but feature parity is region-dependent. Pricing is RMB-denominated with tiered per-second rates. Best fit: china-region products that need first-class mandarin and major chinese-dialect support. Caveats: console + docs are chinese-first; international account access varies by region; export-control considerations. Pricing as listed: tiered RMB-per-second pricing. Feature flags from vendor docs: word-level timestamps, streaming. Directory tags: commercial-api, hyperscaler, regional-asia. Last vendor-page check: 2026-05-12.

Best for: China-region products that need first-class Mandarin and major Chinese-dialect support.
Watch out for: Console + docs are Chinese-first; international account access varies by region; export-control considerations.

Install / use

Alibaba SDK: NlsClient with appkey + AccessKey

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeYes
Languages supportedNone
HIPAA eligibleNo

Alibaba Cloud Intelligent Speech Interaction vs Whipscribe

FeatureAlibaba Cloud Intelligent Speech InteractionWhipscribe
CategoryTranscription APIsTranscription APIs
Pricingtiered RMB-per-second pricingfree beta
Speaker diarizationYes
Word timestampsYesYes
StreamingYesNo
Languages99
PlatformsAPIWeb, API, MCP

Alternatives to Alibaba Cloud Intelligent Speech Interaction

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.