Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min
Rev.ai
Rev.ai is Rev's developer API arm — Reverb ASR plus real-time streaming, with the unique option to drop the same job down to human transcribers when machine-grade accuracy isn't enough.
Rev.ai exposes async transcription via POST /speechtotext/v1/jobs and real-time via a WebSocket streaming endpoint, both powered by the Reverb model family — Rev's in-house English ASR that was open-sourced in October 2024 (Apache-2.0 code, models on HuggingFace) and benchmarks competitively against Whisper Large-v3. Word-level timestamps, diarization, custom vocabulary, language identification, sentiment, topic extraction, and summarization are all wired into the same job submission.
Best for long-form English media, podcasts, and call recordings where accuracy matters more than headline price, plus any workflow that occasionally needs to escalate the same audio to human transcribers under one vendor. Current pricing: Reverb at $0.20/hr (~$0.0033/min), Reverb Turbo at $0.10/hr (~$0.0017/min), Reverb foreign language at $0.30/hr, Whisper Fusion / Whisper Large at $0.005/min, and human transcription at $1.99/min. New accounts get free credit equivalent to 5 hours of Reverb ASR.
What it is
Rev.ai is the developer API from Rev, historically a human-transcription service. The "Machine" endpoint is competitive on English and supports custom vocabulary that handles jargon better than generic Whisper. HIPAA-eligible on appropriate plans. Last price check: 2026-04-20.
Watch out for: Pricier than Whisper-based APIs; non-English coverage narrower.
Install / use
Where Rev.ai fits · 6 use-cases
Rev.ai's pitch clusters around English ASR accuracy, the Reverb open model, and the optional human-grade escalation lane. Pick the card closest to your build — each links to the matching docs.rev.ai section.
Submit an episode URL to the async jobs endpoint and ask for diarization plus word timestamps. Reverb is tuned on Rev's 7M+ hours of human-verified speech, which shows up on long-form conversational English versus generic Whisper.
transcriber=reverb · diarize=true
Use forced alignment to lock an existing transcript to word-level timestamps for SRT/VTT export, or run a fresh job with diarization for caption tracks plus speaker labels. Topic extraction can seed chapter markers.
Word-accurate timing for caption files
Open a WebSocket to the streaming endpoint and pipe PCM frames; partial hypotheses arrive in real time for live captioning, IVR turn-taking, and in-product voice copilots. Custom vocabulary biases brand and jargon words.
Real-time partials + final results
Bundle the transcript with summarization, topic extraction, and sentiment in a single workflow so the LLM downstream sees structured input instead of raw text. Useful for meeting bots, RAG over call audio, and analytics rollups.
Insights stack runs on the same job id
When machine accuracy is not enough, Rev.ai can route the same job to its human transcription network for verbatim, certified output. Same API, same dashboard, same audit trail — useful for depositions, regulatory filings, and accessibility compliance.
$1.99/min · turnaround in hours
For non-English audio, Rev.ai exposes a Reverb Foreign Language model plus Whisper Fusion / Whisper Large as transcriber options, with language identification as a pre-step when the input language is unknown. 57+ languages on the broader stack.
Language ID at $0.003/min seeds the choice
Quickstart · pick a runtime
Three working ways to submit a pre-recorded URL to the async jobs endpoint. Export your access token as REV_AI_API_KEY first — grab one from your Rev.ai dashboard (free credit equivalent to 5 hours of Reverb ASR on new accounts).
Official rev_ai SDK · submit a URL job and poll for the transcript.
# pip install rev_ai
import os, time
from rev_ai import apiclient
client = apiclient.RevAiAPIClient(os.environ["REV_AI_API_KEY"])
job = client.submit_job_url(
"https://www.rev.ai/FTC_Sample_1.mp3",
metadata="podcast-001",
transcriber="reverb",
)
while True:
details = client.get_job_details(job.id)
if details.status.name in ("TRANSCRIBED", "FAILED"):
break
time.sleep(5)
transcript = client.get_transcript_text(job.id)
print(transcript)
Official revai-node-sdk · same async URL job from Node 18+.
// npm install revai-node-sdk
import { RevAiApiClient } from "revai-node-sdk";
const client = new RevAiApiClient(process.env.REV_AI_API_KEY);
const job = await client.submitJobUrl(
"https://www.rev.ai/FTC_Sample_1.mp3",
{ metadata: "podcast-001", transcriber: "reverb" }
);
let details;
do {
await new Promise(r => setTimeout(r, 5000));
details = await client.getJobDetails(job.id);
} while (details.status === "in_progress");
const transcript = await client.getTranscriptText(job.id);
console.log(transcript);
Plain HTTPS POST to /speechtotext/v1/jobs · useful for shell pipelines and edge runtimes.
# submit a URL job
curl --request POST \
--url 'https://api.rev.ai/speechtotext/v1/jobs' \
--header "Authorization: Bearer $REV_AI_API_KEY" \
--header 'Content-Type: application/json' \
--data '{
"source_config": {"url": "https://www.rev.ai/FTC_Sample_1.mp3"},
"metadata": "podcast-001",
"transcriber": "reverb"
}'
# poll job status
curl --request GET \
--url "https://api.rev.ai/speechtotext/v1/jobs/<JOB_ID>" \
--header "Authorization: Bearer $REV_AI_API_KEY"
# fetch the transcript when status=transcribed
curl --request GET \
--url "https://api.rev.ai/speechtotext/v1/jobs/<JOB_ID>/transcript" \
--header "Authorization: Bearer $REV_AI_API_KEY" \
--header 'Accept: text/plain'
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 36 |
| HIPAA eligible | Yes |
Links
- rev.ai ↗Product homepage — positioning, customers, and the developer-first pitch around the Reverb ASR model.
- rev.ai/pricing ↗Current per-hour and per-minute rates across Reverb, Reverb Turbo, Reverb FL, Whisper Fusion, human transcription, and the insights stack.
- docs.rev.ai ↗Documentation root — quickstarts, API reference, and feature guides for async, streaming, and the insights APIs.
- Async API reference ↗POST /speechtotext/v1/jobs reference — request body, transcriber options, diarization, custom vocab, callback URL.
- Streaming API reference ↗Real-time WebSocket endpoint — partial and final hypotheses, audio format requirements, session lifecycle.
- revdotcom/revai-python-sdk ↗Official Python SDK — async + streaming + insights, MIT licensed, actively maintained.
- revdotcom/revai-node-sdk ↗Official Node.js / TypeScript SDK — same feature surface as the Python SDK, MIT licensed.
- revdotcom/reverb ↗Open-source inference code for Rev's Reverb ASR and diarization models — Apache-2.0 code, models on HuggingFace, benchmarks vs Whisper Large-v3 and Canary-1B.
- Language identification docs ↗Pre-transcription language ID at $0.003/min — useful when the input language is unknown and you need to pick a transcriber.
- Summarization docs ↗Standard and premium summarization tiers running on a completed job id — pairs with topic extraction and sentiment.
- Rev blog ↗Product announcements, model releases (Reverb open-source 2024-10), and benchmarks.
Rev.ai vs Whipscribe
| Feature | Rev.ai | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | from $0.02/min | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 36 | 99 |
| Platforms | API | Web, API, MCP |
Sources & dates for the comparison above
- diarization: “Diarization groups words by speaker in the response.” — source (checked 2026-04-23)
- word timestamps: “Each element has a type, value, timestamp ts and end_ts in seconds.” — source (checked 2026-04-23)
- streaming: “Rev AI's Streaming API accepts live audio over WebSockets.” — source (checked 2026-04-23)
- pricing: “Asynchronous speech-to-text starts at $0.02 per minute.” — source (checked 2026-04-23)
Alternatives to Rev.ai
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.