Looking at Speechmatics? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

Speechmatics

Name: Speechmatics
Author: Speechmatics

by Speechmatics

Speechmatics is a UK-based enterprise speech-to-text API with strong accent and dialect coverage across 55+ languages, a managed cloud, and a fully supported on-prem container deployment for regulated workloads.

TL;DR

Speechmatics offers batch and real-time speech-to-text across 55+ languages, with multilingual packs (e.g. Mandarin-English, Spanish-English), Standard and Enhanced operating points, plus a separate medical model for clinical audio. The API surface is one HTTPS endpoint for batch (POST /jobs) and a WebSocket for streaming, with a managed cloud in EU / US / AU regions and a supported on-prem container path for environments that cannot send audio to a third-party cloud.

Best for regulated enterprise buyers — broadcasters, contact centers, legal / public-sector, and healthcare — that need strong recognition across heavy accents, a sovereign or self-hosted deployment option, and the procurement paperwork a major vendor provides. The Free tier gives 480 minutes/month of speech-to-text and 2 concurrent real-time sessions, no card. Pro is usage-based from $0.24/hr with the same 480 min/month included, 50 concurrent real-time sessions, and 10 file jobs/sec. Enterprise is custom-priced with volume discounts above 500 hours/month, unlimited concurrency, and on-prem options. Last price check: 2026-05-10.

What it is

Speechmatics is the enterprise incumbent — strong on heavily-accented English, full on-prem deployment, and the compliance paperwork big buyers require. Not price-competitive for indie projects, but often the only viable option for a regulated enterprise buyer. Last price check: 2026-04-20.

Best for: Regulated enterprise (banks, broadcasters, public sector) needing on-prem or sovereign deployment.
Watch out for: Pricing is quote-based and typically higher than self-service APIs.

Install / use

POST https://api.speechmatics.com/jobs

View Speechmatics API docs ↗

Where Speechmatics fits · 6 segments

Speechmatics is the enterprise incumbent — pick the card closest to your workload. Each links to the canonical product or docs page.

Contact centers

Call analytics · batch + real-time

Transcribe call recordings or live conversations for QA scoring, agent assist, and compliance review. The Enhanced operating point handles heavy phone-line accents and noisy audio better than Standard, and diarization with word timestamps lets you label agent vs caller turns for downstream analytics.

Enhanced + diarization
Batch and real-time both supported

Media monitoring

Broadcast · live captions + archives

Used by broadcasters and media-monitoring vendors for live captioning, post-production transcripts, and back-catalogue indexing. Wide accent and dialect coverage in a single English pack means you do not have to pre-route audio to a regional model.

Real-time WebSocket · Enhanced
Single English model covers global accents

Legal and court reporting

Long-form · diarized + verbatim

Hearings, depositions, and discovery audio batched through the jobs API with diarization and word timestamps for downstream verbatim editing. Procurement-friendly contract terms and a documented retention policy make this a common pick for law firms and court-reporting vendors.

Batch /jobs · Enhanced + diarization
Word-level timestamps for verbatim editing

Public sector and government

Sovereign · on-prem or regional cloud

Government and public-sector buyers that cannot send audio to a US-based cloud can pick a regional SaaS endpoint (EU / US / AU) or the on-prem container path. Same model behind both, with the same accent and language coverage.

EU / US / AU SaaS or on-prem
Regional data residency

Accent-heavy English

Global English · no accent flag

Speechmatics positions strong accent coverage inside a single English pack — Indian, African, Caribbean, regional UK and US accents are all handled by the same model id, so you do not pre-classify the speaker. Useful for global call centers, ed-tech, and consumer apps with international users.

language=en · global accent coverage
One English model for all accents

On-prem and regulated

Self-hosted container · CPU or GPU

Speechmatics ships CPU and GPU containers for batch and real-time, plus Kubernetes manifests and a separate language-identification container. Maximum control over data and deployment for HIPAA, financial, and defense workloads where audio cannot leave the customer environment.

Docker / Kubernetes · CPU or GPU
Same model behind SaaS and on-prem

Pattern: Speechmatics is the right call when accent coverage, on-prem deployment, or enterprise procurement is the decisive factor. If you want the lowest English WER on a self-service per-minute API, OpenAI Whisper API is a closer comparison; for managed transcripts with no API plumbing at all, drop a URL into Whipscribe below.

Quickstart · pick a runtime

Three working ways to talk to Speechmatics. Export your key as SPEECHMATICS_API_KEY first — grab one from the Speechmatics portal after signup (free 480 min/month, no card).

1Python SDK · batch transcription

Official speechmatics-batch package · transcribe a local audio file end-to-end.

# pip install speechmatics-batch python-dotenv
import asyncio
import os
from speechmatics.batch import AsyncClient

async def main():
    client = AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"])
    result = await client.transcribe("audio.wav")
    print(result.transcript_text)
    await client.close()

asyncio.run(main())

Source: speechmatics/speechmatics-python-sdk ↗ · batch quickstart at docs.speechmatics.com/speech-to-text/batch/quickstart ↗

2cURL · submit a batch job

Plain HTTPS POST to /jobs · useful for shell pipelines and CI runners.

# submit a batch job by URL
curl -X POST 'https://api.speechmatics.com/jobs' \
  -H "Authorization: Bearer $SPEECHMATICS_API_KEY" \
  -H 'Content-Type: application/json' \
  -d '{
    "config": {
      "type": "transcription",
      "transcription_config": {
        "language": "en",
        "operating_point": "enhanced",
        "diarization": "speaker"
      }
    },
    "data_uri": "https://example.com/audio.wav"
  }'

Source: docs.speechmatics.com/api-ref/batch/create-a-new-job ↗

3Python · real-time WebSocket

Streaming session via the SDK · the AsyncClient manages the WebSocket connection for you.

# pip install speechmatics-rt python-dotenv
import asyncio
import os
from speechmatics.rt import AsyncClient, TranscriptionConfig, AudioFormat

async def main():
    config = TranscriptionConfig(language="en", operating_point="enhanced")
    audio_format = AudioFormat(type="raw", encoding="pcm_s16le", sample_rate=16000)
    async with AsyncClient(api_key=os.environ["SPEECHMATICS_API_KEY"]) as client:
        await client.start_session(
            transcription_config=config,
            audio_format=audio_format,
        )
        # ... feed PCM chunks via client.send_audio(...) and read transcripts ...

asyncio.run(main())

Source: docs.speechmatics.com/speech-to-text/realtime/quickstart ↗

Features

Speaker diarization	Yes
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	50
HIPAA eligible	Yes

Speechmatics vs Whipscribe

Feature	Speechmatics	Whipscribe
Category	Transcription APIs	Transcription APIs
Pricing	contact sales	free beta
Speaker diarization	Yes	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	50	99
Platforms	API, On-prem	Web, API, MCP

Sources & dates for the comparison above

diarization: “Speaker diarization identifies and labels different speakers in the audio.” — source (checked 2026-04-23)
word timestamps: “Each word includes start_time and end_time in the response.” — source (checked 2026-04-23)
streaming: “Speechmatics offers a Real-Time transcription API over WebSockets.” — source (checked 2026-04-23)
pricing: “Speechmatics published pricing is enterprise contact-sales; no public price tier on their site.” — source (checked 2026-05-07)

Alternatives to Speechmatics

OpenAI Whisper API

OpenAI

Hosted Whisper large-v3 from OpenAI — $0.006 per minute.

$0.006/min

AssemblyAI

Universal-2 model + diarization, PII redaction, topic detection, summarization.

from $0.37/hr

Deepgram

Nova-2 model, excellent streaming, strong at conversational audio.

from $0.0043/min

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.