Looking at Gladia? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

Gladia

by Gladia

Gladia is an EU-based speech-to-text API built around the Solaria-1 model — sub-300 ms real-time streaming plus pre-recorded transcription, with 100+ languages, mid-utterance code-switching, and contractual EU data residency.

TL;DR

Solaria-1 is Gladia's universal STT model — one model id covers 100+ languages with automatic detection and mid-utterance code-switching, partial transcripts under ~100 ms, and built-in diarization, named-entity recognition, sentiment, and summarization. The platform exposes two surfaces: POST /v2/pre-recorded for async batch jobs and POST /v2/live → wss://api.gladia.io/v2/live?token=… for streaming, both keyed off the x-gladia-key header.

Best for voice agents, contact-center analytics, multilingual meeting bots, and any workload where EU data residency or unusual language coverage is the deciding factor. Pay-as-you-go on the Starter plan runs $0.61/hr async (~$0.0102/min) and $0.75/hr real-time (~$0.0125/min) with 10 free hours per month; the Growth tier discounts those to ~$0.20/hr async and ~$0.25/hr real-time with an upfront commit; Enterprise adds unlimited concurrency, zero retention, SLAs, and custom hosting. Last price check: 2026-05-10.

What it is

Gladia wraps Whisper-class models in a developer-friendly API with diarization, 99 languages, and competitive per-minute pricing. A reasonable alternative to self-hosting faster-whisper when you want someone else to operate the GPUs. Last price check: 2026-04-20.

Best for: Teams who like the Whisper model family but don't want to run GPUs.
Watch out for: Smaller ecosystem than AssemblyAI/Deepgram; HIPAA on enterprise tiers only.

Install / use

POST https://api.gladia.io/v2/transcription

View Gladia API docs ↗

Where Gladia fits · 6 use-cases

Gladia's strengths cluster around multilingual accuracy, low-latency partials, and an EU compliance posture that most US-headquartered APIs can't match. Pick the card closest to your build — each links to the canonical docs section.

Voice agents

Real-time · Solaria-1 + audio-to-LLM

Initiate a live session with POST /v2/live, then stream PCM over the returned WebSocket. Partials arrive in roughly 100 ms, which is the latency budget conversational agents need before turn-taking feels broken. Audio-to-LLM lets a single request return both the transcript and a structured LLM response.

POST /v2/live · solaria-1
Sub-300 ms final · ~100 ms partials

Contact centers

Pre-recorded · diarization + redaction

Batch agent + customer recordings through POST /v2/pre-recorded with diarization on, optional speaker count, PII redaction, sentiment, and summarization in one call. SOC 2 Type II, HIPAA, and ISO 27001 are in scope on paid plans.

POST /v2/pre-recorded · diarization=true
Single request returns transcript + speakers + summary

Multilingual content

100+ languages · code-switching

Solaria-1 covers 100+ languages under a single model id and handles mid-utterance switches without re-routing. Gladia's own benchmark calls out 42 languages that competing API vendors don't publish coverage for at all — useful when your inputs aren't English-first.

model=solaria-1 · language_detection=true
Detect + transcribe + code-switch in one pass

Podcast and media

Pre-recorded · subtitles + summary

Send the episode URL or upload bytes, ask for diarization, paragraphs, SRT/VTT subtitles, and a summary in one POST. The audio-to-LLM feature can replace a separate summarization vendor for show-notes and chapter generation.

POST /v2/pre-recorded · subtitles + audio_to_llm
Publishable transcript + chapters in one call

EU data residency

Enterprise · 100% EU hosting

Enterprise plans offer contractual EU-only data residency, zero retention, GDPR plus HIPAA plus SOC 2 Type II plus ISO 27001, and a no-training-on-customer-audio clause. This is the differentiator most US-headquartered APIs cannot match without a separate enterprise paper trail.

Enterprise plan · custom hosting
Zero retention available on request

Real-time live captions

Streaming · partials under ~100 ms

Open the WebSocket returned from /v2/live and push PCM chunks; interim transcripts arrive with word-level timestamps for caption overlays, live-event accessibility, and broadcast workflows. A single live session is capped at three hours per the docs.

wss /v2/live · partials=true
Word-level timestamps inline

Pattern: Gladia is the right call when EU data residency, unusual-language coverage, or sub-100 ms partials are the decisive factor. If you want the lowest English WER on streaming, Deepgram is the closest comparison; for managed transcripts with no infra at all, drop a URL into Whipscribe below.

Quickstart · pick a runtime

Three working ways to call Gladia. Export your key as GLADIA_API_KEY first — grab one from the Gladia console (10 free hours per month on Starter, no card required).

1Python · pre-recorded URL

Two-step POST + poll against /v2/pre-recorded · transcribe any HTTPS audio URL with Solaria-1.

# pip install requests
import os, time, requests

API = "https://api.gladia.io/v2"
KEY = os.environ["GLADIA_API_KEY"]
HDR = {"x-gladia-key": KEY, "Content-Type": "application/json"}

# 1. submit
body = {
    "audio_url": "https://files.gladia.io/example/audio-transcription/split_infinity.wav",
    "diarization": True,
    "subtitles": True,
    "subtitles_config": {"formats": ["srt"]},
}
r = requests.post(f"{API}/pre-recorded", json=body, headers=HDR).json()
result_url = r["result_url"]

# 2. poll
while True:
    res = requests.get(result_url, headers={"x-gladia-key": KEY}).json()
    if res["status"] == "done":
        print(res["result"]["transcription"]["full_transcript"])
        break
    if res["status"] == "error":
        raise RuntimeError(res)
    time.sleep(2)

Reference: docs.gladia.io/api-reference/v2/pre-recorded/init ↗ · full options at docs.gladia.io/chapters/pre-recorded-stt/getting-started ↗

2cURL · no SDK

Plain HTTPS POST to /v2/pre-recorded · useful for shell pipelines and edge runtimes.

# submit a job
curl --request POST \
  --url 'https://api.gladia.io/v2/pre-recorded' \
  --header "x-gladia-key: $GLADIA_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{
    "audio_url": "https://files.gladia.io/example/audio-transcription/split_infinity.wav",
    "diarization": true,
    "subtitles": true,
    "subtitles_config": {"formats": ["srt"]}
  }'
# response includes { id, result_url }

# poll until status=done
curl --request GET \
  --url "$RESULT_URL" \
  --header "x-gladia-key: $GLADIA_API_KEY"

Reference: docs.gladia.io/api-reference/v2/pre-recorded/init ↗

3Node · real-time WebSocket

Init the live session with POST /v2/live, then stream PCM to the returned wss:// URL.

// npm i ws node-fetch
import fetch from "node-fetch";
import WebSocket from "ws";

const KEY = process.env.GLADIA_API_KEY;

// 1. init session
const init = await fetch("https://api.gladia.io/v2/live", {
  method: "POST",
  headers: {
    "x-gladia-key": KEY,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    encoding: "wav/pcm",
    sample_rate: 16000,
    bit_depth: 16,
    channels: 1,
  }),
}).then((r) => r.json());

// 2. open the WebSocket
const ws = new WebSocket(init.url);

ws.on("open", () => {
  // send 16 kHz / 16-bit PCM frames here
});

ws.on("message", (msg) => {
  const evt = JSON.parse(msg.toString());
  if (evt.type === "transcript" && evt.data?.is_final) {
    console.log(evt.data.utterance.text);
  }
});

Reference: docs.gladia.io/api-reference/v2/live/init ↗ · guide at docs.gladia.io/chapters/live-stt/getting-started ↗

Features

Speaker diarization	Yes
Word-level timestamps	Yes
Streaming / real-time	Yes
Languages supported	99
HIPAA eligible	No

Gladia vs Whipscribe

Feature	Gladia	Whipscribe
Category	Transcription APIs	Transcription APIs
Pricing	from $0.0102/min	free beta
Speaker diarization	Yes	Yes
Word timestamps	Yes	Yes
Streaming	Yes	No
Languages	99	99
Platforms	API	Web, API, MCP

Sources & dates for the comparison above

diarization: “Gladia's diarization feature labels each utterance with a speaker identifier.” — source (checked 2026-04-23)
word timestamps: “Per-word timestamps are included with start and end seconds.” — source (checked 2026-04-23)
streaming: “Gladia provides a WebSocket streaming endpoint for live audio.” — source (checked 2026-04-23)
pricing: “Pay-as-you-go pricing from $0.612 per hour (~$0.0102/min).” — source (checked 2026-04-23)

Alternatives to Gladia

OpenAI Whisper API

OpenAI

Hosted Whisper large-v3 from OpenAI — $0.006 per minute.

$0.006/min

AssemblyAI

Universal-2 model + diarization, PII redaction, topic detection, summarization.

from $0.37/hr

Deepgram

Nova-2 model, excellent streaming, strong at conversational audio.

from $0.0043/min

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.