OpenAI Realtime API (STT)

by OpenAI

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

TL;DR

OpenAI's Realtime API streaming speech-in (whisper-1 / gpt-4o-transcribe family).

Best for voice-agent products needing streaming STT tightly coupled with GPT-4o reasoning. Pricing: per-minute audio in (model-dependent).

Category
Transcription APIs
License
Stars
Last push
Pricing
per-minute audio in (model-dependent)
Platforms
API

What it is

OpenAI's Realtime API provides bidirectional streaming voice in/voice out against the gpt-4o-realtime models, alongside dedicated transcription models (gpt-4o-transcribe, gpt-4o-mini-transcribe). It is the canonical pick for voice-agent applications that need extremely tight loop between STT, LLM, and TTS. Pricing is per-minute of audio input and output. See also the separate /audio/transcriptions endpoint for batch transcription. Best fit: voice-agent products needing streaming stt tightly coupled with gpt-4o reasoning. Caveats: no native speaker diarization; cost rises rapidly with bidirectional audio. Pricing as listed: per-minute audio in (model-dependent). Feature flags from vendor docs: streaming. Directory tags: commercial-api, voice-agent. Last vendor-page check: 2026-05-12.

Best for: Voice-agent products needing streaming STT tightly coupled with GPT-4o reasoning.
Watch out for: No native speaker diarization; cost rises rapidly with bidirectional audio.

Install / use

WebSocket: wss://api.openai.com/v1/realtime?model=gpt-4o-realtime

Features

Speaker diarizationNo
Word-level timestampsNo
Streaming / real-timeYes
Languages supported99
HIPAA eligibleNo

OpenAI Realtime API (STT) vs Whipscribe

FeatureOpenAI Realtime API (STT)Whipscribe
CategoryTranscription APIsTranscription APIs
Pricingper-minute audio in (model-dependent)free beta
Speaker diarizationNoYes
Word timestampsYes
StreamingYesNo
Languages9999
PlatformsAPIWeb, API, MCP

Alternatives to OpenAI Realtime API (STT)

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.