ChatGPT Realtime Audio

by OpenAI · launched 2026-05-08
NewOpenAI shipped three GPT-Realtime audio models on 2026-05-08:
All three are developer APIs requiring an OpenAI account and API key.
This page is the honest comparison vs Whipscribe — last price-check 2026-05-08.
TL;DR

If you're a developer building a voice agent and you want GPT-5 reasoning on the audio path with a single endpoint — Realtime-2 is the right tool, and Whipscribe doesn't compete on that.

If you want a transcript — paste a URL, drop a file, or hit an MCP/REST endpoint — Whipscribe is $2/hr PAYG (a bit above Realtime-Whisper's ~$1.02/hr), but Pro at $12/month covers up to 100 hours (effective $0.12/hr — about 8.5× cheaper than ChatGPT Realtime at heavy use). 30 minutes/day free without signup. 99 languages, diarization included, no audio sent to OpenAI servers.

If you care about privacy — Whipscribe runs self-hosted faster-whisper / whisperX with no third-party AI calls on the audio path; ChatGPT Realtime sends every audio stream to OpenAI's US servers, retained per OpenAI's data-usage policy.

The three new models

GPT-Realtime-2

$32 / 1M audio input · $64 / 1M audio output ($0.40 cached input)

Voice agent with GPT-5-class reasoning. Carries multi-turn dialogue, calls tools, executes actions while the user is still talking. The flagship — and the most expensive of the three.

GPT-Realtime-Translate

$0.034 / minute (~$2.04/hr)

Live speech-to-speech translation. 70 input languages → 13 output languages, low-latency streaming. Aimed at meeting bots, dubbing, and customer-support overlay use cases.

GPT-Realtime-Whisper

$0.017 / minute (~$1.02/hr)

Streaming speech-to-text. Same Whisper family OpenAI has shipped before, now exposed as a real-time stream so partial transcripts appear as the speaker talks. The closest direct competitor to Whipscribe's transcription API.

Pricing pulled from OpenAI's launch announcement on 2026-05-08. Audio billing is rounded to the nearest second; Realtime-2 token math depends on input length.

At a glance

Category
Voice AI · streaming STT
Launched
2026-05-08
Cheapest tier
$0.017/min (Whisper)
Languages
99 (Whisper) · 70→13 (Translate)
Streaming
Yes — real-time
Free tier
None — pay from token zero
Account required
Yes — OpenAI + API key
Diarization
Not built-in

ChatGPT Realtime Audio vs Whipscribe

Feature ChatGPT Realtime Audio
OpenAI · 2026-05-08
Whipscribe
Neugence · privacy-first
Product category Voice AI · streaming STT API
developer-only
Transcription utility
web app · API · MCP
ChatGPT GPT · Mac desktop
Cheapest transcription rate Realtime-Whisper · $0.017/min
(~$1.02/hr)
Realtime-Translate · $0.034/min
(~$2.04/hr)
$2 / hour PAYG
or $12 / month Pro
(up to 100 hrs ·
effective $0.12/hr)
credits never expire
Voice-agent / GPT-5 reasoning Yes — Realtime-2
$32 / $64 per 1M tokens
Not offered
we ship transcripts;
bring your own model
(Claude, GPT, local)
Try without signing up No
OpenAI account + API key required
Yes
30 min/day free
no account, no card
Free tier None
billed from minute zero
Anonymous · 30 min/day
+ 2 hours free on signup
Privacy / data residency Audio sent to OpenAI servers (US)
retention per OpenAI data policy
not HIPAA-eligible by default
Self-hosted Whisper
(our own GPU cluster)
audio never sent to OpenAI
no training on uploads
see /security + /privacy
Speaker diarization Not built-in Included
whisperX + pyannote
no extra fee
Word-level timestamps Streaming token deltas
no aligned word timings
SRT · VTT · JSON · DOCX · TXT
word-level alignment
Languages Whisper-realtime · 99
Translate · 70 → 13
99
(full Whisper coverage,
all tiers)
URL input
(YouTube, Vimeo, podcast feeds)
No
raw audio stream only
Yes
paste a link,
we pull the audio
Bulk upload / batch No
one stream per request
Yes
drag many files
parallel jobs
Editing / library / sharing None
developer API only
Web library · folders
share links · trash · search
Live transcription Yes
real-time streaming model
Live Meeting Notes (beta)
streaming Whisper on web
Native integrations OpenAI SDK
Realtime API endpoints
REST API
MCP server (Claude/Cursor)
ChatGPT Custom GPT
Obsidian · Mac desktop
Chrome extension
Subscription option Pure usage metering
no monthly cap
$2/hr PAYG
Pro · $12/mo (up to 100 hrs)
Team · $29/mo (up to 500 hrs)
predictable monthly spend
Privacy in plain English. Whipscribe is privacy-first by design: your audio hits our self-hosted Whisper / whisperX cluster and never leaves it for an OpenAI, AssemblyAI, or Deepgram round-trip. We don’t train on uploads. Anonymous transcripts are auto-deleted on a short clock; signed-in transcripts stay in your library, deletable any time. Read the full posture at /security and /privacy. With ChatGPT Realtime Audio, every audio frame is sent to OpenAI’s US servers and held under OpenAI’s data-usage policy — that may be fine for some teams, and a hard blocker for legal, medical, journalism-source, and EU-residency workflows.

Pricing — head to head

Workload ChatGPT Realtime Audio Whipscribe
1 hour of transcription / month ~$1.02 / hour
(60 min × $0.017,
Realtime-Whisper)
$2.00 PAYG
or $0 if under the
daily 30-min free tier
10 hours / month ~$10.20 $20.00 PAYG
or $12 Pro flat
(up to 100 hrs included)
40 hours / month
(active podcaster / journalist)
~$40.80
(40 × $1.02/hr)
$12 Pro flat
(effective $0.30/hr ·
~3.4× cheaper)
100 hours / month
(podcast network · research lab)
~$102.00
(100 × $1.02/hr)
$12 Pro flat
(effective $0.12/hr ·
~8.5× cheaper)
1 hour live translation ~$2.04 / hour
(60 min × $0.034,
Realtime-Translate)
$2.00 transcript
+ paste into Claude / DeepL
Voice-agent app
(50K interactions / month)
Token-metered
Realtime-2 · $32 / $64 per 1M tokens
Out of scope
we’re a transcription utility,
not a voice agent
Try without paying No free tier 30 min/day anonymous
+ 2 hours free on signup

All numbers from public price pages on 2026-05-08. Realtime-2 voice-agent math depends on conversation length; the line above is illustrative only.

When ChatGPT Realtime Audio is the right call

When Whipscribe is the better fit

FAQ

Which one should I pick for my use case?

If you need a transcript file (TXT, SRT, DOCX) from a recording, an interview, a meeting, a podcast, or a YouTube link — Whipscribe is the right tool. Drop the file or paste the URL and read the transcript.

If you’re a developer building a live voice agent that needs GPT-5 reasoning, tool calls, and barge-in — ChatGPT Realtime-2 is the right tool. It’s an API, not a transcription product.

If you need live speech-to-speech translation across 70 → 13 languages with low latency — ChatGPT Realtime-Translate is built for that. Whipscribe transcribes; translation is a separate step.

Do I need to write code to use ChatGPT Realtime Audio?

Yes. All three models are developer APIs — you’ll need an OpenAI account, an API key, and code that opens an audio stream to api.openai.com. There is no web app and no upload form.

Whipscribe has a web app at whipscribe.com — paste a URL or drop a file and you get a transcript in seconds, no code, no account required for the first 30 minutes a day.

Can I transcribe a YouTube video or podcast URL with ChatGPT Realtime?

Not directly. ChatGPT Realtime accepts a raw audio stream — you’d need to download or capture the audio yourself and pipe it in. Whipscribe accepts a URL: paste a YouTube, Vimeo, podcast, or direct media link and the audio is fetched for you.

How much will I actually pay for typical workloads?

Realtime-Whisper is billed at $0.017 per minute, rounded to the nearest second.

  • 1 hour / month — ChatGPT Realtime ≈ $1.02 · Whipscribe $2.00 PAYG (or $0 if under the daily 30-min free tier)
  • 10 hours / month — ChatGPT Realtime ≈ $10.20 · Whipscribe $20 PAYG or $12 Pro flat
  • 40 hours / month — ChatGPT Realtime ≈ $40.80 · Whipscribe $12 Pro flat (≈ 3.4× cheaper)
  • 100 hours / month — ChatGPT Realtime ≈ $102 · Whipscribe $12 Pro flat (≈ 8.5× cheaper)

Realtime-Translate at $0.034/min (≈ $2.04/hr) and Realtime-2 voice agent ($32 / $64 per 1M tokens) bill the same way — usage-metered, no monthly cap, no free tier.

Is my audio private with each tool?

ChatGPT Realtime Audio: every audio stream is sent to OpenAI’s servers in the United States and retained per OpenAI’s data-usage policy. Default API access is not HIPAA-eligible.

Whipscribe: audio is processed on Whipscribe’s own GPU cluster using self-hosted Whisper / whisperX. Audio is never sent to OpenAI or any third-party AI provider. Recordings are not used for training. The full posture is on /security and /privacy.

Does ChatGPT Realtime work for Zoom / Google Meet / Teams transcripts?

There’s no built-in meeting bot — you’d capture the meeting audio yourself, then stream it in. Whipscribe accepts uploaded recordings (mp4, m4a, mp3, wav and many more) and offers Live Meeting Notes in beta for browser-tab capture.

Which is more accurate?

Realtime-Whisper is the same Whisper family Whipscribe runs (whisper / whisperX). On clean speech the two are very close. Differences in the final transcript come from features layered on top: speaker diarization, word-level alignment, punctuation restoration, and language detection — all included on Whipscribe, not on Realtime-Whisper.

HIPAA / SOC 2 / EU data residency — what are my options?

OpenAI offers HIPAA via their Enterprise tier; default API access is not HIPAA-eligible. EU residency requires an OpenAI Enterprise contract.

Whipscribe runs on Neugence-owned infrastructure with self-hosted models and no third-party AI on the audio path. See /security for the current posture, certifications status, and how to request a DPA.

Can I use both together?

Yes — they solve different problems. A common pattern: use Whipscribe for the transcript (with diarization, timestamps, exports), then feed the text to GPT-5 / Realtime-2 to build a voice agent on top. The Whipscribe MCP server and the ChatGPT Custom GPT make that handoff one click.

Is there a free way to try either one?

ChatGPT Realtime Audio has no free tier — you pay from the first second.

Whipscribe gives every visitor 30 minutes of transcription per day with no signup, plus 2 hours free on signup. Credits don’t expire.

Whipscribe is a managed faster-whisper + whisperX service — privacy-first, $2/hr PAYG or $12/month Pro (up to 100 hrs), no API key to try, 99 languages, diarization included.

Transcribe a file →

Cross-references: OpenAI Whisper API (older, $0.006/min) · Deepgram · AssemblyAI · all 27 tools · our security posture · our pricing.