PREVIEW · subject to change

API docs

A small HTTP surface for transcribing audio & video. Drop a file or a URL, poll the job, pull the transcript in the format you want. Same engine as the web app.

Status: Preview. API keys aren't publicly self-serve yet. If you'd like to build on Whipscribe, email contact@neugence.ai and we'll issue a key within a day.

Auth

Every request carries an API key. Optional user-identity headers attach the job to a specific account so it shows up in "Your files", gets the longer retention window on a paid plan, and survives re-installs.

HeaderRequired?Notes
X-API-Key: <key>required on every request Identifies the calling app + tier. Email contact@neugence.ai to request one — keys aren't self-serve yet.
Authorization: Bearer <Firebase idToken>optional signed-in users Identifies a user signed into whipscribe.com. Jobs submitted with this header are tied to the user's email; retention + "Your files" follow the user, not the key.
X-User-Email: you@example.comoptional paid accounts without Firebase Look-up hint for the credit ledger. Prefer Authorization when the caller is a Firebase-authed browser; use X-User-Email for server-to-server flows.
X-Guest-Token: <token>optional one-off guest purchase Returned by the credit-purchase flow for anonymous buyers.
X-Claim-Token: <token>optional guest submissions Proves ownership of a job submitted without a signed-in user. The submit response includes claim_token; store it client-side and send it on subsequent reads for the same job. See Claim a guest job.

Anonymous calls (no user-identity headers) are allowed on the free tier and subject to per-IP rate limits. Keys in doc examples are always placeholders — don't share yours.

Base URL & CORS

https://whipscribe.com/api/v1

All paths below are relative to this root. The server accepts application/json for URL submits and multipart/form-data for file uploads. CORS is open for GET and POST from browser origins; server-to-server calls have no origin restriction.

Credits & billing

Credit is metered in audio-hours. A 30-minute audio file spends 0.5h. Jobs that fail don't consume credit. Submits over your balance return HTTP 402 with an upgrade_url. See /pricing for tiers, or the /credits dashboard to refresh your server-side balance on a new device.

Submit a file

POST /api/v1/transcribe

Upload an audio or video file as multipart/form-data.

FieldTypeNotes
filerequiredfilemp3, m4a, wav, mp4, mov, ogg, webm, flac. Up to 10 hours per file.
languageoptionalstringISO code (en, es, fr, …). Auto-detected if omitted.
diarizeoptionalbooleanSpeaker labels. Default true.
word_timestampsoptionalbooleanPer-word offsets. Default true.
sourceoptionalenumupload | url | recording | api. Defaults to upload for this endpoint. See Source field.
# curl example
curl https://whipscribe.com/api/v1/transcribe \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -F "file=@episode-412.mp3" \
  -F "language=en" \
  -F "source=api"
// 202 Accepted
{
  "job_id": "35f4be54-aa3e-4adc-85b7-b44f284d1fc3",
  "status": "queued",
  "tier": 2,
  "claim_token": "e0610c19..."   // only when no user identity was supplied
}

Submit a URL

POST /api/v1/transcribe/url

Same pipeline, but we fetch the media for you. Only Creative Commons-licensed YouTube URLs are currently accepted. Accepts the same language, diarize, word_timestamps, and source fields as the multipart endpoint. Defaults source to url.

curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=...", "language":"en", "source":"url"}'

List your jobs

GET /api/v1/jobs?limit={1-100}

Most recent first. Scoped to the caller's identity (Firebase user email, X-User-Email, or the API key). Returns an array of the same row shape as the status endpoint.

// 200 OK
[
  {
    "job_id": "35f4be54-...",
    "status": "done",
    "filename": "episode-412.mp3",
    "audio_duration_seconds": 967,
    "language": "en",
    "source": "upload",
    "created_at": 1776620389.79
  },
  ...
]

Poll job status

GET /api/v1/jobs/{job_id}

Lightweight poll. Recommended cadence: 3 seconds while status ∈ {queued, processing}.

// 200 OK
{
  "job_id": "35f4be54-...",
  "status": "processing",     // queued | processing | done | failed
  "progress": 0.42,                // 0.0–1.0 (best-effort)
  "audio_duration_seconds": 967,
  "language": "en",
  "source": "upload",         // upload | url | recording | api
  "speech_detected": true,    // see "No-speech detection" below
  "speech_ratio": 0.78,         // 0.0–1.0 — fraction classified as speech by the VAD pre-flight
  "error": null
}

No-speech detection

Every submission runs through a Silero VAD pre-flight before reaching Whisper. If the audio doesn't contain transcribable speech (music, ambient noise, near-silence), the job completes successfully with speech_detected: false instead of feeding non-speech audio to Whisper, which would otherwise hallucinate confident-looking text in random languages.

The job's status stays done — the system worked, the audio just didn't contain what we transcribe. The result document carries:

// GET /v1/jobs/{id}/result?format=json — VAD-rejected file
{
  "text": "",
  "language": null,
  "segments": [],
  "speech_detected": false,
  "speech_ratio": 0.02,
  "suggestion": "This file appears to be music or ambient audio. Transcription requires spoken content."
}

Branch your client on speech_detected === false rather than parsing the suggestion text. No usage minutes are charged for VAD-rejected jobs. The threshold is operator-tunable via WHIPSCRIBE_VAD_MIN_SPEECH_RATIO (server-side only).

Get the transcript

GET /api/v1/jobs/{job_id}/result?format={txt|json|srt|vtt|docx}

Returns application/json for format=json, plain text for the rest. format=json gives you the richest payload — text, segments, speaker labels, word timestamps.

// GET .../result?format=json
{
  "text": "Welcome back to the show. Today we're talking about...",
  "language": "en",
  "segments": [
    {
      "start": 0.0,
      "end": 16.3,
      "speaker": "SPEAKER_00",
      "text": "Welcome back to the show...",
      "words": [
        { "start": 0.0, "end": 0.4, "text": "Welcome" },
        ...
      ]
    },
    ...
  ]
}

Playback URL

GET /api/v1/jobs/{job_id}/audio/url

Returns a URL your <audio> element (or any HTTP client that supports Range) can stream the original audio from. Keeps your API key off the playback path.

// 200 OK
{
  "url": "https://audio.del1.vultrobjects.com/…/episode-412.mp3",
  "storage": "vultr",           // "vultr" (direct CDN/presigned) | "disk" (backend-relative, prepend /api)
  "expires_in": 600,             // seconds; refetch on <audio>.error or before playback
  "retention_days": 30            // only on 410; see Retention
}

If storage is "disk", the returned path is backend-relative — prepend your base URL's /api prefix before handing it to the browser. The URL is short-lived; the audio.addEventListener('error', refetchAndResume) pattern is supported.

Claim a guest job

POST /api/v1/jobs/claim

Transfer ownership of jobs that were submitted anonymously (no user identity on submit) to a signed-in user. The submit response for anonymous jobs includes claim_token; keep it client-side until the user signs in, then call this endpoint once to attach every pending token to their email. Extends each claimed job's retention to the claiming user's tier window.

curl https://whipscribe.com/api/v1/jobs/claim \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Authorization: Bearer $FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"claim_tokens": ["e0610c19...", "2ace5e8d..."]}'
// 200 OK
{ "claimed": 2 }

Cancel or delete a job

DELETE /api/v1/jobs/{job_id}

Cancels in-flight jobs or removes completed jobs from the server. Credit isn't refunded for completed jobs. Returns 204 No Content.

Who am I

GET /api/v1/me

Returns what the server sees when you call. Use this instead of hardcoding the retention window in your client — the number is authoritative and will stay in sync if the policy ever shifts.

// 200 OK — signed-in user on the free plan
{
  "email": "you@example.com",
  "tier": "free",              // guest | free | paid — stable public enum
  "retention_days": 30,
  "signed_in": true
}
// 200 OK — anonymous caller (no Authorization / X-User-Email)
{
  "email": null,
  "tier": "guest",
  "retention_days": 3,
  "signed_in": false
}

Treat tier as the stable public field. Any additional fields on this response are implementation detail and may change — don't branch on them.

Retention

Uploaded audio is kept for the window below after the job completes, then auto-deleted. Transcripts stick around until you delete them.

TierAudio retentionWho
guest3 daysanonymous submissions (no signed-in user)
free30 dayssigned-in users on the free plan
paid365 daysany paid plan

Query GET /api/v1/me for the exact retention_days that applies to the caller. When the window has elapsed, GET /jobs/{id}/audio/url returns 410 AUDIO_EXPIRED with the original window in the response body.

Source field

Every job carries a source enum so clients can surface how a transcript was created. Server validates on submit; invalid values return 422 BAD_SOURCE.

ValueMeaningDefault for
uploadmultipart file uploadPOST /transcribe
urlfetched from a URLPOST /transcribe/url
recordingcaptured from browser / extension mic
apiprogrammatic caller (scripts, SDKs, CI, MCP)

Callers can override the default by sending source in the submit body.

Idempotency-Key

Submit endpoints (POST /v1/transcribe, POST /v1/transcribe/url, POST /v1/uploads/init) accept an optional Idempotency-Key: <string> header. Retries carrying the same key for the same API key return the original job's response (HTTP 200, with an X-Idempotent-Replay: true header) instead of creating a duplicate job — matching the Stripe pattern. Keys are scoped per API key, may contain [A-Za-z0-9_.:/-], and must be ≤255 chars with no whitespace; invalid keys return 400 BAD_IDEMPOTENCY_KEY. The key becomes reusable once the original job ages past its retention window.

# First submit
curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Idempotency-Key: job-2026-04-19-abc123" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'
# → 202 {"job_id":"…","status":"queued",…}

# Retry after a dropped connection — same key, same response
curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Idempotency-Key: job-2026-04-19-abc123" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'
# → 200 {"job_id":"<same>","status":"queued",…}  (X-Idempotent-Replay: true)

Error codes

Every error response is JSON-shaped as {"error": "<human sentence>", "code": "<machine enum>"}. Some codes carry extra fields — noted inline.

StatusCodeMeaning
400BAD_IDMalformed job_id (not a valid UUID).
400BAD_URLSubmitted URL isn't http(s).
401MISSING_API_KEYNo X-API-Key header.
401AUTHENTICATION_REQUIREDThe job belongs to a signed-in user (or carries a pending claim token) but the request didn't carry user-identity proof — pass Authorization: Bearer <Firebase idToken> or X-Claim-Token: <token>. Distinct from NOT_FOUND: 401 means "you forgot the headers", 404 means "wrong job_id (or auth provided but doesn't match — anti-enumeration)".
402NO_CREDITSInsufficient credit. Response includes credits snapshot and upgrade_url.
404NOT_FOUNDUnknown job id, or exists but not yours. (We deliberately don't distinguish — don't leak existence.)
410AUDIO_EXPIREDAudio retention window has elapsed. Body includes retention_days — the policy that applied. Transcript is still readable.
410AUDIO_MISSINGAudio was deleted or never persisted.
413FILE_TOO_LARGEMultipart upload exceeds the max size.
415BAD_MIMEFile's detected MIME type is unsupported.
422BAD_SOURCESubmitted source isn't one of the allowed enum values.
400BAD_IDEMPOTENCY_KEYSubmitted Idempotency-Key header failed validation (empty, whitespace, > 255 chars, control chars, or disallowed punctuation). See Idempotency-Key.
429RATE_LIMITEDToo many submits in a short window; retry with backoff.
502BACKEND_ERRORUpstream transcription service error. Retry safely.
502BACKEND_UNREACHABLEWe couldn't reach the transcription backend at all. Retry.

Versioning & surface stability

Only paths under /api/v1/* are considered public and will be deprecated with notice before removal. Anything outside that prefix is internal and may change at any time — don't integrate against it.

Within /v1, response objects are additive: new fields may appear, existing field names and types won't change under the same version. Error code strings are stable; HTTP status + code together uniquely identify a failure mode.

Contact

API keys, higher rate limits, SSO/SAML, on-prem, or feature requests — email contact@neugence.ai. We usually reply within a day.