Transcription APIs transcription tools

Hosted transcription endpoints you call with an API key — no infrastructure to manage.

111 tools · updated 2026-05-15
OpenAI Whisper API
OpenAI

Hosted Whisper large-v3 from OpenAI — $0.006 per minute.

$0.006/min
AssemblyAI
AssemblyAI

Universal-2 model + diarization, PII redaction, topic detection, summarization.

from $0.37/hr
Deepgram
Deepgram

Nova-2 model, excellent streaming, strong at conversational audio.

from $0.0043/min
Rev.ai
Rev.ai

The API spin-off of Rev — strong English accuracy, topic detection, custom vocab.

from $0.02/min
Gladia
Gladia

Whisper-based API with diarization, 99-language coverage, pay-per-minute.

from $0.0102/min
Speechmatics
Speechmatics

Enterprise ASR with strong accents and on-prem deployment options.

contact sales
Whipscribe
Neugence

Hosted faster-whisper + whisperX with paste-a-URL, batch, and MCP access.

This is us
Amazon Transcribe
Amazon Web Services

AWS managed speech-to-text with batch + streaming, custom vocabulary, and medical/call-analytics variants.

from $0.0240/min (standard batch, tiered)
Amazon Transcribe Medical
Amazon Web Services

HIPAA-eligible medical-specialty ASR from AWS for clinical conversations and dictation.

from $0.075/min
Azure AI Speech (Speech-to-Text)
Microsoft Azure

Microsoft Azure's managed STT with batch, real-time, custom speech, and conversation transcription.

from $1/hr (standard) and $0.30/hr (batch transcription)
Google Cloud Speech-to-Text
Google Cloud

GCP Speech v2 with Chirp 2 foundation model, batch + streaming, 125+ language variants.

from $0.016/min (v2 standard) / Chirp tiered
Google Chirp / Chirp 2
Google Cloud

Google's universal speech foundation model exposed via Speech-to-Text v2.

per Speech-to-Text v2 pricing (region-tiered)
IBM Watson Speech to Text
IBM

IBM Cloud's managed ASR with on-prem option, custom acoustic + language models.

Lite (free, capped) + Plus tier (~$0.01/min, tiered)
Oracle Cloud AI Speech
Oracle Cloud Infrastructure

OCI managed speech-to-text with batch + real-time and Whisper-based models.

tiered per-minute (see OCI pricing page)
Alibaba Cloud Intelligent Speech Interaction
Alibaba Cloud

Alibaba's managed Chinese-first ASR with batch + real-time and customizable hotwords.

tiered RMB-per-second pricing
Tencent Cloud ASR
Tencent Cloud

Tencent's managed Chinese-first ASR with one-sentence, real-time, and recording-file modes.

tiered RMB-per-second pricing
Baidu Speech
Baidu AI Cloud

Baidu AI Cloud's Chinese-first speech recognition family.

free tier + tiered RMB-per-call
Yandex SpeechKit
Yandex Cloud

Yandex Cloud's managed Russian-first STT + TTS with batch and streaming.

tiered per-second (RUB-denominated)
Sber SaluteSpeech
Sber (Salute)

Sber's Russian-language speech recognition + synthesis platform.

tiered RUB-per-second (see SmartMarket pricing)
Huawei Cloud Speech Interaction Service
Huawei Cloud

Huawei Cloud's managed ASR + TTS with one-sentence, real-time, and long-audio modes.

tiered per-call (China + international regions)
iFlyTek Open Platform Speech
iFlyTek

iFlyTek's market-leading Mandarin ASR family for enterprise and education.

tiered per-day call quotas (RMB)
Volcengine Speech (ByteDance)
Volcengine (ByteDance)

ByteDance's Volcengine speech-to-text platform powering Douyin/CapCut workflows.

tiered per-second (RMB)
Naver Clova Speech
Naver Cloud Platform

Naver Cloud's Korean-first ASR with batch + real-time and speaker diarization.

tiered KRW-per-second
Kakao Speech (Kakao i)
Kakao Enterprise

Kakao Enterprise's Korean speech recognition + synthesis platform.

Contact sales
NTT Communications COTOHA Voice
NTT Communications

NTT Com's Japanese-first STT under the COTOHA AI platform.

tiered JPY-denominated
AISpeech (Sipeed/iflyOS-class)
AISpeech

Chinese embedded ASR specialist for IoT devices and on-device speech.

Contact sales
Soniox
Soniox

Real-time multilingual ASR API with low-latency streaming and code-switching support.

per-minute (see Soniox pricing page)
ElevenLabs Scribe
ElevenLabs

ElevenLabs' speech-to-text API as a counterpart to its TTS, multilingual, word-timestamped.

from $0.40/hr (see ElevenLabs pricing page)
Sieve
Sieve

Video-AI workflow platform with Whisper-based transcription endpoints.

per-second compute (see Sieve pricing)
Replicate (Whisper hosts)
Replicate

Replicate's catalog of community-hosted Whisper variants behind one API.

per-second GPU compute
Modal (ASR endpoints)
Modal Labs

Modal's serverless GPU platform commonly used to host Whisper / faster-whisper as an API.

per-second GPU compute (see Modal pricing)
RunPod (Whisper endpoints)
RunPod

RunPod's GPU cloud commonly used to deploy Whisper / faster-whisper as a serverless endpoint.

per-second GPU compute
fal.ai (Whisper / wizper endpoints)
fal.ai

fal.ai's hosted Whisper-family endpoints — low-latency, pay-per-second.

per-second compute (see fal.ai pricing)
Groq (Whisper endpoints)
Groq

Groq's LPU-based Whisper-large-v3 endpoint — exceptionally low-latency transcription.

from $0.111/hr (whisper-large-v3, batch)
OpenAI Realtime API (STT)
OpenAI

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

per-minute audio in (model-dependent)
Vatis Tech
Vatis Tech

Romanian-headquartered transcription API with strong CEE language coverage.

Free tier + paid hours (see Vatis Tech pricing)
Wit.ai (Meta)
Meta

Meta's free natural-language and speech understanding platform.

free
Vonage Voice API (ASR Connector)
Vonage

Vonage's CPaaS speech-to-text via the ASR connector (typically Deepgram-powered).

Vonage Voice price + ASR per-minute
Plivo Voice (Speech Recognition)
Plivo

Plivo's CPaaS speech recognition for IVR + call-recording workflows.

Plivo Voice + per-minute ASR
Bandwidth Voice (Transcription)
Bandwidth

Bandwidth's voice CPaaS with optional transcription on recordings and IVR.

per-minute (see Bandwidth pricing)
Play.HT (STT endpoints)
Play.HT

Play.HT's transcription endpoint as a counterpart to its TTS family.

per-minute (see Play.HT pricing)
Lemonfox.ai
Lemonfox.ai

Hosted Whisper API at low per-hour pricing for developers.

from $0.17/hr (Whisper)
Speakbot (Whisper API alt)
Speakbot

Hosted Whisper API with file-based and URL ingestion.

per-minute (see Speakbot pricing)
Voicegain
Voicegain

Deep-learning ASR you can deploy in your own cloud or use as managed SaaS.

per-minute SaaS + Edge license
Amazon Transcribe Streaming
Amazon Web Services

Real-time streaming variant of Amazon Transcribe over HTTP/2 + WebSocket.

from $0.024/min
Azure Fast Transcription
Microsoft Azure

Azure Speech's batch-fast mode for short-turnaround transcription with predictable latency.

per-Azure-Speech pricing (Fast variant)
Google Cloud Speaker Diarization
Google Cloud

Diarization layer for Google Cloud Speech-to-Text v2.

per-Speech v2 pricing
Deepgram Nova-3
Deepgram

Deepgram's current-generation streaming + batch ASR model.

from $0.0043/min (Nova-3, batch)
AssemblyAI Realtime / Streaming
AssemblyAI

AssemblyAI's WebSocket streaming endpoint for live captions and agents.

from $0.15/hr (Streaming)
Gladia Realtime
Gladia

Gladia's real-time streaming ASR API with multilingual code-switching.

per-hour streaming (see Gladia pricing)
OpenAI /audio/transcriptions (whisper-1, gpt-4o-transcribe)
OpenAI

OpenAI's hosted Whisper + gpt-4o-transcribe models, batch endpoint.

from $0.006/min (whisper-1)
OpenAI /audio/translations
OpenAI

OpenAI's translate-to-English audio endpoint.

from $0.006/min (whisper-1)
SambaNova (Whisper endpoints)
SambaNova

SambaNova's hosted Whisper-large-v3 endpoint on its RDU accelerator.

see SambaNova pricing
Together AI (Whisper)
Together AI

Together AI's hosted Whisper models among its open-model catalog.

per-Together-AI pricing
DeepInfra (Whisper)
DeepInfra

DeepInfra's hosted Whisper endpoint with per-second GPU pricing.

per-Deepinfra pricing
OVHcloud AI Speech-to-Text
OVHcloud

OVHcloud's managed speech-to-text inside its sovereign EU cloud.

per-OVHcloud pricing
Scaleway AI Inference
Scaleway

Scaleway's GPU inference platform commonly used for hosted Whisper.

per-Scaleway pricing (compute-based)
Alibaba Tongyi Audio (Qwen-Audio)
Alibaba Cloud

Alibaba's Tongyi multimodal model exposed for transcription + audio understanding.

tiered RMB-per-token / per-second
Baidu ERNIE Speech
Baidu

Baidu's ERNIE-aligned speech models inside ERNIE Bot Cloud.

tiered RMB-per-call
Huawei Pangu Speech
Huawei Cloud

Huawei's Pangu foundation models extended to speech for enterprise scenarios.

tiered RMB-per-call
Tencent Hunyuan (audio modality)
Tencent Cloud

Tencent's Hunyuan multimodal model with audio understanding endpoints.

tiered RMB-per-token
Naver HyperCLOVA X (audio)
Naver Cloud Platform

Naver's HyperCLOVA X foundation model with audio understanding.

Naver Cloud HyperCLOVA pricing
Kakao Kanana Speech
Kakao Enterprise

Kakao's Kanana foundation-model family with audio understanding.

Contact sales
Rev.ai Streaming
Rev

Rev.ai's WebSocket streaming endpoint for live transcripts.

from $0.035/min (Streaming)
Speechmatics (batch / language packs)
Speechmatics

Speechmatics batch ASR with broad language pack catalog.

from $1.04/hr (Standard)
Hume EVI
Hume AI Inc.

Empathic voice interface with emotional-tone awareness.

pay-as-you-go
Otter.ai API
Otter.ai Inc.

Developer API access to Otter.ai's transcription engine.

contact sales
Rasa Pro
Rasa Technologies GmbH

Open-source-anchored conversational AI for enterprise.

free / contact sales
Google Cloud Dialogflow
Google Cloud

Google's conversational-AI platform for voice and chat agents.

usage-based
Amazon Lex
Amazon Web Services

AWS conversational-AI platform for voice and text bots.

usage-based
Microsoft Bot Framework
Microsoft Corporation

Microsoft's open-source SDK and platform for conversational bots.

free SDK + Azure costs
Rev VoiceHub
Rev

Rev's enterprise transcription and recording API platform.

paid
Trint API
Trint

Trint's transcription and translation API for newsrooms and media teams.

paid
iFlyTek Open Platform
iFlyTek

China's largest speech AI vendor — Mandarin, dialects, and 60+ languages via developer APIs.

tiered · free quota + pay-as-you-go in CNY
Tencent Cloud ASR
Tencent Cloud

Tencent's cloud speech-to-text with one-sentence, sentence, and real-time APIs.

tiered · per-second pricing in CNY
Alibaba DAMO ASR
Alibaba Cloud

Alibaba Cloud / DAMO Academy speech recognition with Paraformer non-autoregressive models.

tiered · per-hour pricing in CNY/USD
Volcengine Speech
ByteDance Volcano Engine

ByteDance's Volcano Engine speech-to-text — short, long, and streaming Mandarin ASR.

tiered · pay-as-you-go in CNY
Mobvoi Speech
Mobvoi

Mobvoi (Chumen Wenwen) speech APIs — Mandarin recognition behind TicWatch and Volkswagen voice.

enterprise · contact sales
NetEase Youdao ASR
NetEase Youdao

Youdao Cloud speech-to-text — Mandarin recognition behind Youdao Translator and dictionary pen.

tiered · per-character pricing in CNY
Sogou ASR
Sogou (Tencent)

Sogou (Tencent-owned) speech-to-text — input-method-grade Mandarin recognition.

enterprise · contact sales
Reverie Language Tech
Reverie Language Technologies

Reverie's Indic speech recognition — 11 Indian languages from one of Reliance Jio's group companies.

enterprise · contact sales
Bhashini
Government of India (Digital India Bhashini Division)

Government of India's national language platform — public ASR APIs for 22 official languages.

free for non-commercial; commercial tiers TBD
Sarvam AI
Sarvam AI

Sarvam AI — full-stack Indian foundation models including Saaras / Saaransh speech APIs.

tiered · per-minute pricing with free tier
Tinkoff VoiceKit
Tinkoff (T-Bank)

Tinkoff VoiceKit — Russian-language ASR + TTS used inside Tinkoff Bank's contact centre.

tiered · per-second pricing in RUB
SoundHound
SoundHound AI

SoundHound Houndify — multilingual voice AI platform with embedded and cloud ASR.

tiered · contact sales for enterprise pricing
Lelapa AI
Lelapa AI

Lelapa AI — South African startup building Vulavula speech and language tools for African languages.

tiered · pay-as-you-go with free tier
Intella
Intella

Intella — Arabic speech-to-text API focused on MSA and major Arabic dialects.

tiered · per-hour pricing in USD with free tier
Alvenir Danish ASR
Alvenir

Alvenir — Danish-language speech-to-text product from a Copenhagen startup.

tiered · pay-as-you-go in EUR/DKK
AI-Loop
AI-Loop

AI-Loop — multilingual African-language speech and NLP infrastructure.

tiered · pay-as-you-go in USD
Hume EVI
Hume AI

Empathic Voice Interface — voice AI that reads and responds to emotion in speech.

see vendor pricing
Dialogflow CX
Google Cloud

Google Cloud's enterprise conversational AI platform with voice and chat channels.

see vendor pricing
Microsoft Bot Framework (Voice)
Microsoft

Microsoft's bot orchestration SDK with voice channels via Direct Line Speech.

see vendor pricing
IBM watsonx Assistant (Voice)
IBM

IBM's enterprise conversational AI platform with voice and contact-center integrations.

see vendor pricing
Vertex AI Conversation
Google Cloud

Google Cloud's LLM-native conversational AI builder with voice support.

see vendor pricing
Twilio Voice Intelligence + Agents
Twilio

Twilio's ASR, voice intelligence, and ConversationRelay primitives for voice agents.

see vendor pricing
Plivo AI
Plivo

Voice AI agent capability layered on Plivo's CPaaS voice network.

see vendor pricing
Bandwidth Voice AI
Bandwidth

AI voice tooling layered on Bandwidth's tier-1 U.S. carrier network.

see vendor pricing
Telnyx Voice AI
Telnyx

AI inference and voice agents on Telnyx's own carrier and GPU stack.

see vendor pricing
Voximplant
Voximplant

CPaaS with serverless VoxEngine scenarios and AI voice integrations.

see vendor pricing
Daily.co Voice
Daily.co

WebRTC infrastructure for realtime voice and video AI agents.

see vendor pricing
Deepgram Voice Agent API
Deepgram

Single API for low-latency voice agents bundling Deepgram ASR + LLM + TTS.

see vendor pricing
AssemblyAI LeMUR Voice
AssemblyAI

AssemblyAI's LLM framework over its ASR for voice intelligence and agents.

see vendor pricing
ElevenLabs Conversational AI
ElevenLabs

ElevenLabs' end-to-end voice agent API with ASR, LLM, and premium TTS.

see vendor pricing
Azure AI Speech Voice Agent
Microsoft

Microsoft Azure's bundle of Speech SDK + Bot Framework for voice agents.

see vendor pricing
Ultravox Agents
Fixie.ai

Speech-native LLM and hosted agent runtime by Fixie.ai.

see vendor pricing
OpenAI Realtime Agents SDK
OpenAI

OpenAI's Agents SDK pattern over the Realtime API for voice-native assistants.

see vendor pricing
Anthropic Voice Agent Patterns
Anthropic

Reference patterns for building voice agents with Anthropic Claude models.

see vendor pricing
Cartesia Voice Agent stack
Cartesia

Cartesia's Sonic TTS plus partner ASR/LLM for low-latency voice agents.

see vendor pricing
Pollyo
Pollyo

AI-dubbing API for video platforms — backend OEM rather than a creator-facing app.

paid
Camb.ai TTS
Camb.ai

Camb.ai's standalone text-to-speech surface — same MARS model that powers their dubbing.

paid
iSpeech
iSpeech

TTS + STT API with consumer text-reader apps.

freemium — API per-request, consumer apps free