Free · 30 minutes a day · paid unlocks longer files + branded captions

Generate SRT subtitles with real word-level timing.

Upload a recording or paste a URL, get a word-level-timed SRT file. Speaker-labelled, editable, and ready to drop into any video editor or HTML5 player.

30 min / day free · no signup · $1/hr PAYG after · Never used to train AI · Or upload a file →
Word-level cue timing · 100+ languages · Speaker labels included · Never used to train AI

What you get

An SRT that drops cleanly into any editor.

Word-level cue timing

Each cue is built from word-level timestamps, not whole-line guesses. Captions snap to the exact word the speaker says — no half-second drift on punchlines.

Speaker labels

Two-host shows, panels, and Q&As come out as 'Host:' / 'Guest:' lines so the burned-in captions read like a screenplay, not a wall of text.

Edit before export

Fix a misheard word in our editor; the SRT regenerates with timing intact. Faster than pulling the file into a desktop captioning tool to rewrite one line.

Editor-ready format

Plain UTF-8 SRT with sequential numbering, hh:mm:ss,ms timing, and CRLF line endings. Drops into YouTube Studio, Premiere, Final Cut, DaVinci Resolve, OBS, VLC.

Why word-level matters

Auto-captions vs a real SRT.

✗ Platform auto-captions

Free, but stripped: 3-second chunks, no speaker tags, no edit path. Useful for accessibility, useless when you need a clean caption file for an editor.

  • Whole-line timing — no per-word snap
  • No speaker labels
  • Choppy 3-second segments
  • Often missing for non-English content
  • No download for offline editing

✓ A real Whipscribe SRT

Word-level cues, speaker labels, full punctuation. Edit before export, then drop straight into your editor. Free for the first 30 minutes a day; paid plans unlock longer files, batch SRT, and branded burn-in.

  • Word-level cue timing
  • Speaker labels on every line
  • 100+ languages, auto-detected
  • Click-to-edit before exporting
  • Drop-in for any editor or HTML5 player

Sample SRT

What the output looks like.

Word-level cues, speaker-labelled, ready to drop into your editor.

transcript · whipscribe.com/view/srt-generator
HOST 00:00:01,200 Welcome back to the show.
GUEST 00:00:03,450 Thanks for having me — long-time listener.
HOST 00:00:06,100 Let's start with the question everyone asks first.
GUEST 00:00:09,300 How did I end up building this in my garage.

Export

One transcript. Five clean formats.

Every paid tier exports all five. The free tier exports TXT and SRT.

.srt

SRT captions

Word-level. Every video editor reads this.

.vtt

WebVTT

HTML5 player + YouTube uploads.

.txt

Plain text

De-ummed paragraphs. Ready to paste.

.docx

Show notes

Formatted with chapters and pull-quotes.

.json

Machine-readable

Per-word timing + speaker IDs.

Pricing

Honest pricing, no surprises.

Credits never expire. Upgrade or downgrade any month. Free tier resets daily — no signup, no card.

Free

$0/forever

Try every feature for 30 minutes a day. No card.

  • 30 min / day
  • Speaker labels included
  • TXT + SRT export
  • No history retention
Try free

Pay-as-you-go

$1/hour

Best for one-off projects. Credits never expire.

  • $10 minimum top-up
  • Every export format
  • 365-day history
  • API access
Top up

Pro

$8/month

Indie creators. 100 hours / month, all features.

  • 100 hours / month
  • Clips + every aspect ratio
  • Branded captions
  • Priority queue
See Pro

Team

$29/month

Teams. 500 hours / month, shared workspace.

  • 500 hours / month
  • Shared library
  • API + MCP for Claude
  • Workspace billing
See Team

FAQ

SRT generator questions, answered.

What's the difference between SRT and VTT?

SRT is the older, simpler format — comma-separated timing (hh:mm:ss,ms), no styling. Almost every editor reads it. VTT is the HTML5 standard — period-separated timing (hh:mm:ss.ms), supports cue styling, positioning, and metadata. If you're targeting YouTube, Premiere, or Final Cut, use SRT. If you're targeting a custom HTML5 player or want styled captions, use VTT. We export both from the same source.

How long can the file be on the free tier?

Free covers up to 30 minutes a day, total — could be one 30-minute file or three 10-minute files. Paid PAYG ($1/hr) and Pro ($8/mo for 100h) remove the daily cap and add longer single-file limits.

Which languages work?

100+ languages are auto-detected, including English, Spanish, French, German, Portuguese, Italian, Dutch, Hindi, Mandarin, Japanese, Korean, Arabic, Russian, Polish, Turkish, and many more. You can also force a language code at upload if auto-detect picks wrong on a multi-language clip.

How accurate is the timing?

Word-level timing is typically within 50–100ms of the audio truth on clean recordings. Heavy background music, overlapping speech, or thick accents widen the window. Editing one or two cues manually is faster than fighting a worse engine.

Can I burn the captions into the video?

Burn-in (hardcoded captions, branded styling, custom fonts) is a paid feature on Pro and Team. Free tier exports the SRT file — you can burn that in yourself with FFmpeg, Premiere, or Resolve.

Is the file stored anywhere?

On the free tier nothing is retained — the SRT is generated, served to your browser, and dropped. On paid tiers files live in your private library for 365 days; you can delete any item from your account settings.

Related

Related tools and pages.

Drop a recording. Get a real SRT.

Generate SRT

Operated by Neugence Technology Pvt. Ltd. · contact@neugence.ai · Security · Privacy · Terms