Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min
Gladia
Gladia is an EU-based speech-to-text API built around the Solaria-1 model — sub-300 ms real-time streaming plus pre-recorded transcription, with 100+ languages, mid-utterance code-switching, and contractual EU data residency.
Solaria-1 is Gladia's universal STT model — one model id covers 100+ languages with automatic detection and mid-utterance code-switching, partial transcripts under ~100 ms, and built-in diarization, named-entity recognition, sentiment, and summarization. The platform exposes two surfaces: POST /v2/pre-recorded for async batch jobs and POST /v2/live → wss://api.gladia.io/v2/live?token=… for streaming, both keyed off the x-gladia-key header.
Best for voice agents, contact-center analytics, multilingual meeting bots, and any workload where EU data residency or unusual language coverage is the deciding factor. Pay-as-you-go on the Starter plan runs $0.61/hr async (~$0.0102/min) and $0.75/hr real-time (~$0.0125/min) with 10 free hours per month; the Growth tier discounts those to ~$0.20/hr async and ~$0.25/hr real-time with an upfront commit; Enterprise adds unlimited concurrency, zero retention, SLAs, and custom hosting. Last price check: 2026-05-10.
What it is
Gladia wraps Whisper-class models in a developer-friendly API with diarization, 99 languages, and competitive per-minute pricing. A reasonable alternative to self-hosting faster-whisper when you want someone else to operate the GPUs. Last price check: 2026-04-20.
Watch out for: Smaller ecosystem than AssemblyAI/Deepgram; HIPAA on enterprise tiers only.
Install / use
Where Gladia fits · 6 use-cases
Gladia's strengths cluster around multilingual accuracy, low-latency partials, and an EU compliance posture that most US-headquartered APIs can't match. Pick the card closest to your build — each links to the canonical docs section.
Initiate a live session with POST /v2/live, then stream PCM over the returned WebSocket. Partials arrive in roughly 100 ms, which is the latency budget conversational agents need before turn-taking feels broken. Audio-to-LLM lets a single request return both the transcript and a structured LLM response.
Sub-300 ms final · ~100 ms partials
Batch agent + customer recordings through POST /v2/pre-recorded with diarization on, optional speaker count, PII redaction, sentiment, and summarization in one call. SOC 2 Type II, HIPAA, and ISO 27001 are in scope on paid plans.
Single request returns transcript + speakers + summary
Solaria-1 covers 100+ languages under a single model id and handles mid-utterance switches without re-routing. Gladia's own benchmark calls out 42 languages that competing API vendors don't publish coverage for at all — useful when your inputs aren't English-first.
Detect + transcribe + code-switch in one pass
Send the episode URL or upload bytes, ask for diarization, paragraphs, SRT/VTT subtitles, and a summary in one POST. The audio-to-LLM feature can replace a separate summarization vendor for show-notes and chapter generation.
Publishable transcript + chapters in one call
Enterprise plans offer contractual EU-only data residency, zero retention, GDPR plus HIPAA plus SOC 2 Type II plus ISO 27001, and a no-training-on-customer-audio clause. This is the differentiator most US-headquartered APIs cannot match without a separate enterprise paper trail.
Zero retention available on request
Open the WebSocket returned from /v2/live and push PCM chunks; interim transcripts arrive with word-level timestamps for caption overlays, live-event accessibility, and broadcast workflows. A single live session is capped at three hours per the docs.
Word-level timestamps inline
Quickstart · pick a runtime
Three working ways to call Gladia. Export your key as GLADIA_API_KEY first — grab one from the Gladia console (10 free hours per month on Starter, no card required).
Two-step POST + poll against /v2/pre-recorded · transcribe any HTTPS audio URL with Solaria-1.
# pip install requests
import os, time, requests
API = "https://api.gladia.io/v2"
KEY = os.environ["GLADIA_API_KEY"]
HDR = {"x-gladia-key": KEY, "Content-Type": "application/json"}
# 1. submit
body = {
"audio_url": "https://files.gladia.io/example/audio-transcription/split_infinity.wav",
"diarization": True,
"subtitles": True,
"subtitles_config": {"formats": ["srt"]},
}
r = requests.post(f"{API}/pre-recorded", json=body, headers=HDR).json()
result_url = r["result_url"]
# 2. poll
while True:
res = requests.get(result_url, headers={"x-gladia-key": KEY}).json()
if res["status"] == "done":
print(res["result"]["transcription"]["full_transcript"])
break
if res["status"] == "error":
raise RuntimeError(res)
time.sleep(2)
Plain HTTPS POST to /v2/pre-recorded · useful for shell pipelines and edge runtimes.
# submit a job
curl --request POST \
--url 'https://api.gladia.io/v2/pre-recorded' \
--header "x-gladia-key: $GLADIA_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"audio_url": "https://files.gladia.io/example/audio-transcription/split_infinity.wav",
"diarization": true,
"subtitles": true,
"subtitles_config": {"formats": ["srt"]}
}'
# response includes { id, result_url }
# poll until status=done
curl --request GET \
--url "$RESULT_URL" \
--header "x-gladia-key: $GLADIA_API_KEY"
Init the live session with POST /v2/live, then stream PCM to the returned wss:// URL.
// npm i ws node-fetch
import fetch from "node-fetch";
import WebSocket from "ws";
const KEY = process.env.GLADIA_API_KEY;
// 1. init session
const init = await fetch("https://api.gladia.io/v2/live", {
method: "POST",
headers: {
"x-gladia-key": KEY,
"Content-Type": "application/json",
},
body: JSON.stringify({
encoding: "wav/pcm",
sample_rate: 16000,
bit_depth: 16,
channels: 1,
}),
}).then((r) => r.json());
// 2. open the WebSocket
const ws = new WebSocket(init.url);
ws.on("open", () => {
// send 16 kHz / 16-bit PCM frames here
});
ws.on("message", (msg) => {
const evt = JSON.parse(msg.toString());
if (evt.type === "transcript" && evt.data?.is_final) {
console.log(evt.data.utterance.text);
}
});
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 99 |
| HIPAA eligible | No |
Links
- gladia.io ↗Product homepage — EU-based STT API positioning, Solaria-1 model, voice-agent and contact-center use cases.
- docs.gladia.io ↗Documentation root — quickstarts for pre-recorded, real-time, and audio-intelligence feature surfaces.
- API reference · pre-recorded ↗POST /v2/pre-recorded — request schema for diarization, subtitles, summarization, audio-to-LLM, plus the result-poll endpoint.
- API reference · live ↗POST /v2/live — init payload (encoding, sample rate, channels) and the wss URL pattern returned for the audio stream.
- Solaria-1 model page ↗Model card — 100+ language coverage, ~100 ms partials, accuracy claims on EN/ES/FR/IT benchmarks, code-switching.
- gladia.io/pricing ↗Current Starter, Growth, and Enterprise tiers — async, real-time, free-hour allowance, and zero-retention options.
- status.gladia.io ↗Live uptime for the Application, Pre-Recorded, and Real-Time components — subscribe via email or RSS.
- gladia.io/blog ↗Product blog — recent posts cover audio-to-LLM in a single POST, summarization, and Solaria-1 release notes.
Gladia vs Whipscribe
| Feature | Gladia | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | from $0.0102/min | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 99 | 99 |
| Platforms | API | Web, API, MCP |
Sources & dates for the comparison above
- diarization: “Gladia's diarization feature labels each utterance with a speaker identifier.” — source (checked 2026-04-23)
- word timestamps: “Per-word timestamps are included with start and end seconds.” — source (checked 2026-04-23)
- streaming: “Gladia provides a WebSocket streaming endpoint for live audio.” — source (checked 2026-04-23)
- pricing: “Pay-as-you-go pricing from $0.612 per hour (~$0.0102/min).” — source (checked 2026-04-23)
Alternatives to Gladia
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.