Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min
Speechmatics
Speechmatics is a UK-based enterprise speech-to-text API with strong accent and dialect coverage across 55+ languages, a managed cloud, and a fully supported on-prem container deployment for regulated workloads.
Speechmatics offers batch and real-time speech-to-text across 55+ languages, with multilingual packs (e.g. Mandarin-English, Spanish-English), Standard and Enhanced operating points, plus a separate medical model for clinical audio. The API surface is one HTTPS endpoint for batch (POST /jobs) and a WebSocket for streaming, with a managed cloud in EU / US / AU regions and a supported on-prem container path for environments that cannot send audio to a third-party cloud.
Best for regulated enterprise buyers — broadcasters, contact centers, legal / public-sector, and healthcare — that need strong recognition across heavy accents, a sovereign or self-hosted deployment option, and the procurement paperwork a major vendor provides. The Free tier gives 480 minutes/month of speech-to-text and 2 concurrent real-time sessions, no card. Pro is usage-based from $0.24/hr with the same 480 min/month included, 50 concurrent real-time sessions, and 10 file jobs/sec. Enterprise is custom-priced with volume discounts above 500 hours/month, unlimited concurrency, and on-prem options. Last price check: 2026-05-10.
What it is
Speechmatics is the enterprise incumbent — strong on heavily-accented English, full on-prem deployment, and the compliance paperwork big buyers require. Not price-competitive for indie projects, but often the only viable option for a regulated enterprise buyer. Last price check: 2026-04-20.
Watch out for: Pricing is quote-based and typically higher than self-service APIs.
Install / use
Where Speechmatics fits · 6 segments
Speechmatics is the enterprise incumbent — pick the card closest to your workload. Each links to the canonical product or docs page.
Transcribe call recordings or live conversations for QA scoring, agent assist, and compliance review. The Enhanced operating point handles heavy phone-line accents and noisy audio better than Standard, and diarization with word timestamps lets you label agent vs caller turns for downstream analytics.
Batch and real-time both supported
Used by broadcasters and media-monitoring vendors for live captioning, post-production transcripts, and back-catalogue indexing. Wide accent and dialect coverage in a single English pack means you do not have to pre-route audio to a regional model.
Single English model covers global accents
Hearings, depositions, and discovery audio batched through the jobs API with diarization and word timestamps for downstream verbatim editing. Procurement-friendly contract terms and a documented retention policy make this a common pick for law firms and court-reporting vendors.
Word-level timestamps for verbatim editing
Government and public-sector buyers that cannot send audio to a US-based cloud can pick a regional SaaS endpoint (EU / US / AU) or the on-prem container path. Same model behind both, with the same accent and language coverage.
Regional data residency
Speechmatics positions strong accent coverage inside a single English pack — Indian, African, Caribbean, regional UK and US accents are all handled by the same model id, so you do not pre-classify the speaker. Useful for global call centers, ed-tech, and consumer apps with international users.
One English model for all accents
Speechmatics ships CPU and GPU containers for batch and real-time, plus Kubernetes manifests and a separate language-identification container. Maximum control over data and deployment for HIPAA, financial, and defense workloads where audio cannot leave the customer environment.
Same model behind SaaS and on-prem
Quickstart · pick a runtime
Three working ways to talk to Speechmatics. Export your key as SPEECHMATICS_API_KEY first — grab one from the Speechmatics portal after signup (free 480 min/month, no card).
Official speechmatics-batch package · transcribe a local audio file end-to-end.
# pip install speechmatics-batch python-dotenv
import asyncio
import os
from speechmatics.batch import AsyncClient
async def main():
client = AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"])
result = await client.transcribe("audio.wav")
print(result.transcript_text)
await client.close()
asyncio.run(main())
Plain HTTPS POST to /jobs · useful for shell pipelines and CI runners.
# submit a batch job by URL
curl -X POST 'https://api.speechmatics.com/jobs' \
-H "Authorization: Bearer $SPEECHMATICS_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"config": {
"type": "transcription",
"transcription_config": {
"language": "en",
"operating_point": "enhanced",
"diarization": "speaker"
}
},
"data_uri": "https://example.com/audio.wav"
}'
Streaming session via the SDK · the AsyncClient manages the WebSocket connection for you.
# pip install speechmatics-rt python-dotenv
import asyncio
import os
from speechmatics.rt import AsyncClient, TranscriptionConfig, AudioFormat
async def main():
config = TranscriptionConfig(language="en", operating_point="enhanced")
audio_format = AudioFormat(type="raw", encoding="pcm_s16le", sample_rate=16000)
async with AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"]) as client:
await client.start_session(
transcription_config=config,
audio_format=audio_format,
)
# ... feed PCM chunks via client.send_audio(...) and read transcripts ...
asyncio.run(main())
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 50 |
| HIPAA eligible | Yes |
Links
- speechmatics.com ↗Product homepage — speech-to-text, voice agents, text-to-speech, and the 55+ language pitch.
- speechmatics.com/pricing ↗Current plan tiers — Free (480 min/month), Pro from $0.24/hr, Enterprise with volume discounts above 500 hrs/month.
- docs.speechmatics.com ↗Documentation root — quickstarts, batch and real-time guides, deployments, and API reference.
- API reference — create a batch job ↗POST /jobs reference with config schema, diarization, operating points, and translation options.
- Supported languages ↗Full language table — 55+ languages with Standard and Enhanced operating points, plus bilingual packs.
- Deployments — SaaS and on-prem ↗Comparison of SaaS regions (EU / US / AU) vs on-prem containers (CPU, GPU, Kubernetes, language-id).
- github.com/speechmatics ↗Official GitHub organization — Python SDK, JS / TS SDK, CLI, examples, and community repos.
- speechmatics/speechmatics-python-sdk ↗Official Python SDK — async clients for batch and real-time transcription.
- speechmatics/speechmatics-js-sdk ↗Official JavaScript / TypeScript SDK — browser, Node 18+, and edge runtimes.
- status.speechmatics.com ↗Live status for Batch SaaS (EU / US / AU), Realtime SaaS (EU / US), portal, docs, and on-prem registry.
- portal.speechmatics.com ↗Self-service portal — create API keys, view usage, manage billing; signup gives 480 min/month free, no card.
- speechmatics.com/blog ↗Product and engineering blog — recent posts on alphanumeric recognition, on-device deployments, and STT comparisons.
Speechmatics vs Whipscribe
| Feature | Speechmatics | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | contact sales | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 50 | 99 |
| Platforms | API, On-prem | Web, API, MCP |
Sources & dates for the comparison above
- diarization: “Speaker diarization identifies and labels different speakers in the audio.” — source (checked 2026-04-23)
- word timestamps: “Each word includes start_time and end_time in the response.” — source (checked 2026-04-23)
- streaming: “Speechmatics offers a Real-Time transcription API over WebSockets.” — source (checked 2026-04-23)
- pricing: “Speechmatics published pricing is enterprise contact-sales; no public price tier on their site.” — source (checked 2026-05-07)
Alternatives to Speechmatics
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.