PREVIEW · subject to change

API docs

A small HTTP surface for transcribing audio & video. Drop a file or a URL, poll the job, pull the transcript in the format you want. Same engine as the web app.

Status: Preview. API keys aren't publicly self-serve yet. If you'd like to build on Whipscribe, email contact@neugence.ai and we'll issue a key within a day.

Auth

Every request carries an API key. Optional user-identity headers attach the job to a specific account so it shows up in "Your files", gets the longer retention window on a paid plan, and survives re-installs.

Header	Required?	Notes
`X-API-Key: <key>`required	on every request	Identifies the calling app + tier. Email contact@neugence.ai to request one — keys aren't self-serve yet.
`Authorization: Bearer <Firebase idToken>`optional	signed-in users	Identifies a user signed into whipscribe.com. Jobs submitted with this header are tied to the user's email; retention + "Your files" follow the user, not the key.
`X-User-Email: you@example.com`optional	paid accounts without Firebase	Look-up hint for the credit ledger. Prefer `Authorization` when the caller is a Firebase-authed browser; use `X-User-Email` for server-to-server flows.
`X-Guest-Token: <token>`optional	one-off guest purchase	Returned by the credit-purchase flow for anonymous buyers.
`X-Claim-Token: <token>`optional	guest submissions	Proves ownership of a job submitted without a signed-in user. The submit response includes `claim_token`; store it client-side and send it on subsequent reads for the same job. See Claim a guest job.

Anonymous calls (no user-identity headers) are allowed on the free tier and subject to per-IP rate limits. Keys in doc examples are always placeholders — don't share yours.

Base URL & CORS

https://whipscribe.com/api/v1

All paths below are relative to this root. The server accepts application/json for URL submits and multipart/form-data for file uploads. CORS is open for GET and POST from browser origins; server-to-server calls have no origin restriction.

Credits & billing

Credit is metered in audio-hours. A 30-minute audio file spends 0.5h. Jobs that fail don't consume credit. Submits over your balance return HTTP 402 with an upgrade_url. See /pricing for tiers, or the /credits dashboard to refresh your server-side balance on a new device.

Submit a file

POST /api/v1/transcribe

Upload an audio or video file as multipart/form-data.

Field	Type	Notes
`file`required	file	mp3, m4a, wav, mp4, mov, ogg, webm, flac. Up to 10 hours per file.
`language`optional	string	ISO code (`en`, `es`, `fr`, …). Auto-detected if omitted.
`diarize`optional	boolean	Speaker labels. Default `true`.
`word_timestamps`optional	boolean	Per-word offsets. Default `true`.
`source`optional	enum	`upload` \| `url` \| `recording` \| `api`. Defaults to `upload` for this endpoint. See Source field.

# curl example
curl https://whipscribe.com/api/v1/transcribe \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -F "file=@episode-412.mp3" \
  -F "language=en" \
  -F "source=api"

// 202 Accepted
{
  "job_id": "35f4be54-aa3e-4adc-85b7-b44f284d1fc3",
  "status": "queued",
  "tier": 2,
  "claim_token": "e0610c19..."   // only when no user identity was supplied
}

Submit a URL

POST /api/v1/transcribe/url

Same pipeline, but we fetch the media for you. Only Creative Commons-licensed YouTube URLs are currently accepted. Accepts the same language, diarize, word_timestamps, and source fields as the multipart endpoint. Defaults source to url.

curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://www.youtube.com/watch?v=...", "language":"en", "source":"url"}'

List your jobs

GET /api/v1/jobs?limit={1-100}

Most recent first. Scoped to the caller's identity (Firebase user email, X-User-Email, or the API key). Returns an array of the same row shape as the status endpoint.

// 200 OK
[
  {
    "job_id": "35f4be54-...",
    "status": "done",
    "filename": "episode-412.mp3",
    "audio_duration_seconds": 967,
    "language": "en",
    "source": "upload",
    "created_at": 1776620389.79
  },
  ...
]

Poll job status

GET /api/v1/jobs/{job_id}

Lightweight poll. Recommended cadence: 3 seconds while status ∈ {queued, processing}.

// 200 OK
{
  "job_id": "35f4be54-...",
  "status": "processing",     // queued | processing | done | failed
  "progress": 0.42,                // 0.0–1.0 (best-effort)
  "audio_duration_seconds": 967,
  "language": "en",
  "source": "upload",         // upload | url | recording | api
  "speech_detected": true,    // see "No-speech detection" below
  "speech_ratio": 0.78,         // 0.0–1.0 — fraction classified as speech by the VAD pre-flight
  "error": null
}

No-speech detection

Every submission runs through a Silero VAD pre-flight before reaching Whisper. If the audio doesn't contain transcribable speech (music, ambient noise, near-silence), the job completes successfully with speech_detected: false instead of feeding non-speech audio to Whisper, which would otherwise hallucinate confident-looking text in random languages.

The job's status stays done — the system worked, the audio just didn't contain what we transcribe. The result document carries:

// GET /v1/jobs/{id}/result?format=json — VAD-rejected file
{
  "text": "",
  "language": null,
  "segments": [],
  "speech_detected": false,
  "speech_ratio": 0.02,
  "suggestion": "This file appears to be music or ambient audio. Transcription requires spoken content."
}

Branch your client on speech_detected === false rather than parsing the suggestion text. No usage minutes are charged for VAD-rejected jobs. The threshold is operator-tunable via WHIPSCRIBE_VAD_MIN_SPEECH_RATIO (server-side only).

Get the transcript

GET /api/v1/jobs/{job_id}/result?format={txt|json|srt|vtt|docx}

Returns application/json for format=json, plain text for the rest. format=json gives you the richest payload — text, segments, speaker labels, word timestamps.

// GET .../result?format=json
{
  "text": "Welcome back to the show. Today we're talking about...",
  "language": "en",
  "segments": [
    {
      "start": 0.0,
      "end": 16.3,
      "speaker": "SPEAKER_00",
      "text": "Welcome back to the show...",
      "words": [
        { "start": 0.0, "end": 0.4, "text": "Welcome" },
        ...
      ]
    },
    ...
  ]
}

Playback URL

GET /api/v1/jobs/{job_id}/audio/url

Returns a URL your <audio> element (or any HTTP client that supports Range) can stream the original audio from. Keeps your API key off the playback path.

// 200 OK
{
  "url": "https://audio.del1.vultrobjects.com/…/episode-412.mp3",
  "storage": "vultr",           // "vultr" (direct CDN/presigned) | "disk" (backend-relative, prepend /api)
  "expires_in": 600,             // seconds; refetch on <audio>.error or before playback
  "retention_days": 30            // only on 410; see Retention
}

If storage is "disk", the returned path is backend-relative — prepend your base URL's /api prefix before handing it to the browser. The URL is short-lived; the audio.addEventListener('error', refetchAndResume) pattern is supported.

Claim a guest job

POST /api/v1/jobs/claim

Transfer ownership of jobs that were submitted anonymously (no user identity on submit) to a signed-in user. The submit response for anonymous jobs includes claim_token; keep it client-side until the user signs in, then call this endpoint once to attach every pending token to their email. Extends each claimed job's retention to the claiming user's tier window.

curl https://whipscribe.com/api/v1/jobs/claim \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Authorization: Bearer $FIREBASE_ID_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"claim_tokens": ["e0610c19...", "2ace5e8d..."]}'

// 200 OK
{ "claimed": 2 }

Cancel or delete a job

DELETE /api/v1/jobs/{job_id}

Cancels in-flight jobs or removes completed jobs from the server. Credit isn't refunded for completed jobs. Returns 204 No Content.

Who am I

GET /api/v1/me

Returns what the server sees when you call. Use this instead of hardcoding the retention window in your client — the number is authoritative and will stay in sync if the policy ever shifts.

// 200 OK — signed-in user on the free plan
{
  "email": "you@example.com",
  "tier": "free",              // guest | free | paid — stable public enum
  "retention_days": 30,
  "signed_in": true
}

// 200 OK — anonymous caller (no Authorization / X-User-Email)
{
  "email": null,
  "tier": "guest",
  "retention_days": 3,
  "signed_in": false
}

Treat tier as the stable public field. Any additional fields on this response are implementation detail and may change — don't branch on them.

Retention

Uploaded audio is kept for the window below after the job completes, then auto-deleted. Transcripts stick around until you delete them.

Tier	Audio retention	Who
`guest`	3 days	anonymous submissions (no signed-in user)
`free`	30 days	signed-in users on the free plan
`paid`	365 days	any paid plan

Query GET /api/v1/me for the exact retention_days that applies to the caller. When the window has elapsed, GET /jobs/{id}/audio/url returns 410 AUDIO_EXPIRED with the original window in the response body.

Source field

Every job carries a source enum so clients can surface how a transcript was created. Server validates on submit; invalid values return 422 BAD_SOURCE.

Value	Meaning	Default for
`upload`	multipart file upload	`POST /transcribe`
`url`	fetched from a URL	`POST /transcribe/url`
`recording`	captured from browser / extension mic	—
`api`	programmatic caller (scripts, SDKs, CI, MCP)	—

Callers can override the default by sending source in the submit body.

Idempotency-Key

Submit endpoints (POST /v1/transcribe, POST /v1/transcribe/url, POST /v1/uploads/init) accept an optional Idempotency-Key: <string> header. Retries carrying the same key for the same API key return the original job's response (HTTP 200, with an X-Idempotent-Replay: true header) instead of creating a duplicate job — matching the Stripe pattern. Keys are scoped per API key, may contain [A-Za-z0-9_.:/-], and must be ≤255 chars with no whitespace; invalid keys return 400 BAD_IDEMPOTENCY_KEY. The key becomes reusable once the original job ages past its retention window.

# First submit
curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Idempotency-Key: job-2026-04-19-abc123" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'
# → 202 {"job_id":"…","status":"queued",…}

# Retry after a dropped connection — same key, same response
curl https://whipscribe.com/api/v1/transcribe/url \
  -H "X-API-Key: $WHIPSCRIBE_KEY" \
  -H "Idempotency-Key: job-2026-04-19-abc123" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/audio.mp3"}'
# → 200 {"job_id":"<same>","status":"queued",…}  (X-Idempotent-Replay: true)

Error codes

Every error response is JSON-shaped as {"error": "<human sentence>", "code": "<machine enum>"}. Some codes carry extra fields — noted inline.

Status	Code	Meaning
`400`	`BAD_ID`	Malformed `job_id` (not a valid UUID).
`400`	`BAD_URL`	Submitted URL isn't `http(s)`.
`401`	`MISSING_API_KEY`	No `X-API-Key` header.
`401`	`AUTHENTICATION_REQUIRED`	The job belongs to a signed-in user (or carries a pending claim token) but the request didn't carry user-identity proof — pass `Authorization: Bearer <Firebase idToken>` or `X-Claim-Token: <token>`. Distinct from `NOT_FOUND`: 401 means "you forgot the headers", 404 means "wrong job_id (or auth provided but doesn't match — anti-enumeration)".
`402`	`NO_CREDITS`	Insufficient credit. Response includes `credits` snapshot and `upgrade_url`.
`404`	`NOT_FOUND`	Unknown job id, or exists but not yours. (We deliberately don't distinguish — don't leak existence.)
`410`	`AUDIO_EXPIRED`	Audio retention window has elapsed. Body includes `retention_days` — the policy that applied. Transcript is still readable.
`410`	`AUDIO_MISSING`	Audio was deleted or never persisted.
`413`	`FILE_TOO_LARGE`	Multipart upload exceeds the max size.
`415`	`BAD_MIME`	File's detected MIME type is unsupported.
`422`	`BAD_SOURCE`	Submitted `source` isn't one of the allowed enum values.
`400`	`BAD_IDEMPOTENCY_KEY`	Submitted `Idempotency-Key` header failed validation (empty, whitespace, > 255 chars, control chars, or disallowed punctuation). See Idempotency-Key.
`429`	`RATE_LIMITED`	Too many submits in a short window; retry with backoff.
`502`	`BACKEND_ERROR`	Upstream transcription service error. Retry safely.
`502`	`BACKEND_UNREACHABLE`	We couldn't reach the transcription backend at all. Retry.

Versioning & surface stability

Only paths under /api/v1/* are considered public and will be deprecated with notice before removal. Anything outside that prefix is internal and may change at any time — don't integrate against it.

Within /v1, response objects are additive: new fields may appear, existing field names and types won't change under the same version. Error code strings are stable; HTTP status + code together uniquely identify a failure mode.

Contact

API keys, higher rate limits, SSO/SAML, on-prem, or feature requests — email contact@neugence.ai. We usually reply within a day.