Looking at Deepgram? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

Deepgram

by Deepgram

Deepgram is a real-time speech API built for voice agents and call analytics — Nova-3 ships low-latency streaming plus pre-recorded transcription, with diarization, PII redaction, and summarization wired into a single REST + WebSocket surface.

TL;DR

Nova-3 is Deepgram's current flagship model, available in nova-3-general (multilingual) and nova-3-medical. The platform exposes one REST endpoint for pre-recorded audio (/v1/listen) and a WebSocket for live streaming, with diarization, word-level timestamps, PII redaction, and summarization toggled by query parameters.

Best for real-time voice agents, call-center analytics, live captioning, and meeting tools where p50 streaming latency is the product. New accounts get $200 in free credit with no card; pay-as-you-go pricing runs from $0.0048/min streaming and $0.0077/min pre-recorded on Nova-3 monolingual, with a separate per-minute rate for the Voice Agent API.

Category
Transcription APIs
License
Stars
Last push
Pricing
from $0.0043/min
Platforms
API

What it is

Deepgram's Nova-2 is one of the strongest streaming ASR models on the market, with very low latency and good accuracy on conversational audio. HIPAA-eligible, per-minute pricing competitive with self-hosted for modest volume. Last price check: 2026-04-20.

Best for: Real-time voice apps (agents, meeting tools) where streaming latency is the product.
Watch out for: Lower language coverage than Whisper variants; proprietary.

Install / use

View Deepgram API docs ↗

Where Deepgram fits · 6 use-cases

Deepgram's strengths cluster around streaming latency, conversational accuracy, and an API surface that bundles agents + ASR + redaction. Pick the card closest to your build — each links to the canonical docs section.

Voice agents
Agent API · ASR + LLM + TTS

The Agent API wraps Nova-3 STT, an LLM step, and Deepgram TTS behind one WebSocket so you ship a conversational agent without stitching three providers. Drop-in for phone bots, IVR replacements, and in-product voice copilots.

nova-3 + agent WebSocket
Built-in turn-taking + barge-in
Call-center analytics
Pre-recorded · diarization + redaction

Batch call recordings through /v1/listen with diarize=true and redact=pii to get speaker-labeled, PII-scrubbed transcripts ready for QA scoring and topic mining. HIPAA-eligible on paid plans.

POST /v1/listen · nova-3-general
diarize + redact + summarize=v2
Live captioning
Streaming · sub-300 ms partials

Open a WebSocket to wss://api.deepgram.com/v1/listen and stream PCM; interim transcripts arrive word-by-word with timestamps. Common stack for webinar captions, live-event accessibility, and broadcast workflows.

wss /v1/listen · interim_results=true
Word-level timestamps inline
Podcast · long-form
Pre-recorded · speakers + summary

Send the episode URL or upload bytes; ask for diarize=true, punctuate=true, paragraphs=true, and summarize=v2 in one call to get a publishable transcript plus a model-generated recap.

POST /v1/listen · multi-feature
Single request returns all artifacts
Multilingual content
Nova-3 multilingual model

nova-3-general handles 10 base languages plus regional variants under one model id — useful when you can't predict the input language, or when you need code-switching inside a single utterance.

model=nova-3-general
Detect + transcribe in one pass
Regulated / PII workloads
Redaction · HIPAA-eligible

Set redact=pii (or fine-grained tags like numbers, ssn) and the transcript ships with sensitive spans replaced by typed placeholders like [PHONE_NUMBER_1] — raw audio is not retained when the no-store option is enabled on enterprise plans.

redact=pii · model=nova-3-medical
Healthcare model for clinical audio
Pattern: Deepgram is the right call when streaming latency, an integrated agent loop, or a single per-minute SKU across batch + live is the decisive factor. If you want the lowest English WER and don't need a streaming endpoint, OpenAI Whisper API is the closest comparison; for managed transcripts with no infra at all, drop a URL into Whipscribe below.

Quickstart · pick a language

Three working ways to transcribe a remote audio URL with Nova-3. Export your key as DEEPGRAM_API_KEY first — grab one from the Deepgram console (free $200 credit, no card).

1Python SDK · pre-recorded URL

Official deepgram-sdk v7+ · transcribe any HTTPS audio URL with Nova-3.

# pip install deepgram-sdk
import os
from deepgram import DeepgramClient, PrerecordedOptions

dg = DeepgramClient(os.environ["DEEPGRAM_API_KEY"])

source  = {"url": "https://dpgr.am/spacewalk.wav"}
options = PrerecordedOptions(
    model="nova-3",
    smart_format=True,
    diarize=True,
    punctuate=True,
    summarize="v2",
)

resp = dg.listen.rest.v("1").transcribe_url(source, options)
print(resp.results.channels[0].alternatives[0].transcript)
2Node / JavaScript SDK

Official @deepgram/sdk · same pre-recorded call from Node 18+.

// npm install @deepgram/sdk
import { createClient } from "@deepgram/sdk";

const dg = createClient(process.env.DEEPGRAM_API_KEY);

const { result, error } = await dg.listen.prerecorded.transcribeUrl(
  { url: "https://dpgr.am/spacewalk.wav" },
  {
    model: "nova-3",
    smart_format: true,
    diarize: true,
    punctuate: true,
    summarize: "v2",
  }
);

if (error) throw error;
console.log(result.results.channels[0].alternatives[0].transcript);
3cURL · no SDK

Plain HTTPS POST to /v1/listen · useful for shell pipelines and edge runtimes.

# pre-recorded URL
curl --request POST \
  --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true&diarize=true&punctuate=true&summarize=v2' \
  --header "Authorization: Token $DEEPGRAM_API_KEY" \
  --header 'Content-Type: application/json' \
  --data '{"url":"https://dpgr.am/spacewalk.wav"}'

# or a local file
curl --request POST \
  --url 'https://api.deepgram.com/v1/listen?model=nova-3&smart_format=true' \
  --header "Authorization: Token $DEEPGRAM_API_KEY" \
  --header 'Content-Type: audio/wav' \
  --data-binary @call.wav

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeYes
Languages supported36
HIPAA eligibleYes

Links

Deepgram vs Whipscribe

FeatureDeepgramWhipscribe
CategoryTranscription APIsTranscription APIs
Pricingfrom $0.0043/minfree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingYesNo
Languages3699
PlatformsAPIWeb, API, MCP
Sources & dates for the comparison above
  1. diarization: “Diarization recognizes speaker changes and attributes speech to speakers.”source (checked 2026-04-23)
  2. word timestamps: “Each word returned includes start and end times in seconds.”source (checked 2026-04-23)
  3. streaming: “Deepgram's streaming API transcribes live audio in real time over WebSockets.”source (checked 2026-04-23)
  4. pricing: “Nova model pre-recorded transcription from $0.0043 per minute (pay-as-you-go).”source (checked 2026-04-23)

Alternatives to Deepgram

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.