Looking at Speechmatics? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

Speechmatics

by Speechmatics

Speechmatics is a UK-based enterprise speech-to-text API with strong accent and dialect coverage across 55+ languages, a managed cloud, and a fully supported on-prem container deployment for regulated workloads.

TL;DR

Speechmatics offers batch and real-time speech-to-text across 55+ languages, with multilingual packs (e.g. Mandarin-English, Spanish-English), Standard and Enhanced operating points, plus a separate medical model for clinical audio. The API surface is one HTTPS endpoint for batch (POST /jobs) and a WebSocket for streaming, with a managed cloud in EU / US / AU regions and a supported on-prem container path for environments that cannot send audio to a third-party cloud.

Best for regulated enterprise buyers — broadcasters, contact centers, legal / public-sector, and healthcare — that need strong recognition across heavy accents, a sovereign or self-hosted deployment option, and the procurement paperwork a major vendor provides. The Free tier gives 480 minutes/month of speech-to-text and 2 concurrent real-time sessions, no card. Pro is usage-based from $0.24/hr with the same 480 min/month included, 50 concurrent real-time sessions, and 10 file jobs/sec. Enterprise is custom-priced with volume discounts above 500 hours/month, unlimited concurrency, and on-prem options. Last price check: 2026-05-10.

Category
Transcription APIs
License
Stars
Last push
Pricing
contact sales
Platforms
API, On-prem

What it is

Speechmatics is the enterprise incumbent — strong on heavily-accented English, full on-prem deployment, and the compliance paperwork big buyers require. Not price-competitive for indie projects, but often the only viable option for a regulated enterprise buyer. Last price check: 2026-04-20.

Best for: Regulated enterprise (banks, broadcasters, public sector) needing on-prem or sovereign deployment.
Watch out for: Pricing is quote-based and typically higher than self-service APIs.

Install / use

View Speechmatics API docs ↗

Where Speechmatics fits · 6 segments

Speechmatics is the enterprise incumbent — pick the card closest to your workload. Each links to the canonical product or docs page.

Contact centers
Call analytics · batch + real-time

Transcribe call recordings or live conversations for QA scoring, agent assist, and compliance review. The Enhanced operating point handles heavy phone-line accents and noisy audio better than Standard, and diarization with word timestamps lets you label agent vs caller turns for downstream analytics.

Enhanced + diarization
Batch and real-time both supported
Media monitoring
Broadcast · live captions + archives

Used by broadcasters and media-monitoring vendors for live captioning, post-production transcripts, and back-catalogue indexing. Wide accent and dialect coverage in a single English pack means you do not have to pre-route audio to a regional model.

Real-time WebSocket · Enhanced
Single English model covers global accents
Legal and court reporting
Long-form · diarized + verbatim

Hearings, depositions, and discovery audio batched through the jobs API with diarization and word timestamps for downstream verbatim editing. Procurement-friendly contract terms and a documented retention policy make this a common pick for law firms and court-reporting vendors.

Batch /jobs · Enhanced + diarization
Word-level timestamps for verbatim editing
Public sector and government
Sovereign · on-prem or regional cloud

Government and public-sector buyers that cannot send audio to a US-based cloud can pick a regional SaaS endpoint (EU / US / AU) or the on-prem container path. Same model behind both, with the same accent and language coverage.

EU / US / AU SaaS or on-prem
Regional data residency
Accent-heavy English
Global English · no accent flag

Speechmatics positions strong accent coverage inside a single English pack — Indian, African, Caribbean, regional UK and US accents are all handled by the same model id, so you do not pre-classify the speaker. Useful for global call centers, ed-tech, and consumer apps with international users.

language=en · global accent coverage
One English model for all accents
On-prem and regulated
Self-hosted container · CPU or GPU

Speechmatics ships CPU and GPU containers for batch and real-time, plus Kubernetes manifests and a separate language-identification container. Maximum control over data and deployment for HIPAA, financial, and defense workloads where audio cannot leave the customer environment.

Docker / Kubernetes · CPU or GPU
Same model behind SaaS and on-prem
Pattern: Speechmatics is the right call when accent coverage, on-prem deployment, or enterprise procurement is the decisive factor. If you want the lowest English WER on a self-service per-minute API, OpenAI Whisper API is a closer comparison; for managed transcripts with no API plumbing at all, drop a URL into Whipscribe below.

Quickstart · pick a runtime

Three working ways to talk to Speechmatics. Export your key as SPEECHMATICS_API_KEY first — grab one from the Speechmatics portal after signup (free 480 min/month, no card).

1Python SDK · batch transcription

Official speechmatics-batch package · transcribe a local audio file end-to-end.

# pip install speechmatics-batch python-dotenv
import asyncio
import os
from speechmatics.batch import AsyncClient

async def main():
    client = AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"])
    result = await client.transcribe("audio.wav")
    print(result.transcript_text)
    await client.close()

asyncio.run(main())
2cURL · submit a batch job

Plain HTTPS POST to /jobs · useful for shell pipelines and CI runners.

# submit a batch job by URL
curl -X POST 'https://api.speechmatics.com/jobs' \
  -H "Authorization: Bearer $SPEECHMATICS_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "config": {
      "type": "transcription",
      "transcription_config": {
        "language": "en",
        "operating_point": "enhanced",
        "diarization": "speaker"
      }
    },
    "data_uri": "https://example.com/audio.wav"
  }'
3Python · real-time WebSocket

Streaming session via the SDK · the AsyncClient manages the WebSocket connection for you.

# pip install speechmatics-rt python-dotenv
import asyncio
import os
from speechmatics.rt import AsyncClient, TranscriptionConfig, AudioFormat

async def main():
    config = TranscriptionConfig(language="en", operating_point="enhanced")
    audio_format = AudioFormat(type="raw", encoding="pcm_s16le", sample_rate=16000)
    async with AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"]) as client:
        await client.start_session(
            transcription_config=config,
            audio_format=audio_format,
        )
        # ... feed PCM chunks via client.send_audio(...) and read transcripts ...

asyncio.run(main())

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeYes
Languages supported50
HIPAA eligibleYes

Links

Speechmatics vs Whipscribe

FeatureSpeechmaticsWhipscribe
CategoryTranscription APIsTranscription APIs
Pricingcontact salesfree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingYesNo
Languages5099
PlatformsAPI, On-premWeb, API, MCP
Sources & dates for the comparison above
  1. diarization: “Speaker diarization identifies and labels different speakers in the audio.”source (checked 2026-04-23)
  2. word timestamps: “Each word includes start_time and end_time in the response.”source (checked 2026-04-23)
  3. streaming: “Speechmatics offers a Real-Time transcription API over WebSockets.”source (checked 2026-04-23)
  4. pricing: “Speechmatics published pricing is enterprise contact-sales; no public price tier on their site.”source (checked 2026-05-07)

Alternatives to Speechmatics

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.