Looking at OpenAI Realtime API (STT)? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

OpenAI Realtime API (STT)

Name: OpenAI Realtime API (STT)
Author: OpenAI

by OpenAI

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

TL;DR

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

Best for voice-agent products needing streaming STT tightly coupled with GPT-4o reasoning. Pricing: per-minute audio in (model-dependent).

What it is

OpenAI's Realtime API provides bidirectional streaming voice in/voice out against the gpt-4o-realtime models, alongside dedicated transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe). It is the canonical pick for voice-agent applications that need extremely tight loop between STT, LLM, and TTS. Pricing is per-minute of audio input and output. See also the separate /audio/transcriptions endpoint for batch transcription. Best fit: voice-agent products needing streaming stt tightly coupled with gpt-4o reasoning. Caveats: no native speaker diarization; cost rises rapidly with bidirectional audio. Pricing as listed: per-minute audio in (model-dependent). Feature flags from vendor docs: streaming. Directory tags: commercial-api, voice-agent. Last vendor-page check: 2026-05-12.

Best for: Voice-agent products needing streaming STT tightly coupled with GPT-4o reasoning.
Watch out for: No native speaker diarization; cost rises rapidly with bidirectional audio.

Install / use

WebSocket: wss://api.openai.com/v1/realtime?model=gpt-4o-realtime

Features

Speaker diarization	No
Word-level timestamps	No
Streaming / real-time	Yes
Languages supported	99
HIPAA eligible	No

OpenAI Realtime API (STT) vs Whipscribe

Feature	OpenAI Realtime API (STT)	Whipscribe
Category	Transcription APIs	Transcription APIs
Pricing	per-minute audio in (model-dependent)	free beta
Speaker diarization	No	Yes
Word timestamps	—	Yes
Streaming	Yes	No
Languages	99	99
Platforms	API	Web, API, MCP

Alternatives to OpenAI Realtime API (STT)

OpenAI Whisper API

OpenAI

Hosted Whisper large-v3 from OpenAI — $0.006 per minute.

$0.006/min

AssemblyAI

Universal-2 model + diarization, PII redaction, topic detection, summarization.

from $0.37/hr

Deepgram

Nova-2 model, excellent streaming, strong at conversational audio.

from $0.0043/min

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.