Free · 30 minutes a day · paid unlocks longer files + styled cue exports

Generate WebVTT captions for HTML5 video and YouTube.

WebVTT is the HTML5 standard for captions. Word-level cues, speaker labels, optional cue styling — pasted straight into a track element or YouTube native upload.

30 min / day free · no signup · $1/hr PAYG after · Never used to train AI · Or upload a file →
WebVTT spec-compliant · Word-level cue timing · HTML5 + YouTube ready · Never used to train AI

What you get

A WebVTT file that just works in HTML5 video.

Spec-compliant VTT

WEBVTT header, period-separated timing (hh:mm:ss.ms), blank lines between cues, optional cue identifiers. Validates against W3C WebVTT spec — passes the FFmpeg and YouTube native upload checks.

Cue styling and positioning

Optional cue settings — line, position, align, vertical — plus inline styling tags. Place a speaker name top-left, a translation bottom-center, a chapter marker mid-screen.

Word-level timing

Each cue is built from word-level timestamps, not whole-line guesses. Captions snap to the exact word — required for karaoke-style highlight rendering and accurate seeking.

Drop into a track element

Paste the file URL into <track src='captions.vtt' kind='subtitles'> and the browser renders captions natively. Works in Safari, Chrome, Firefox, Edge — no JavaScript player library required.

VTT vs SRT

When to use VTT instead of SRT.

✗ An SRT pasted into HTML5

Some browsers tolerate it, most don't. Comma timing isn't VTT-spec; styling is unsupported; positioning is impossible. Works in VLC, breaks in <track>.

  • Comma-separated timing — not VTT spec
  • No cue styling support
  • No positioning, alignment, or vertical text
  • Inconsistent browser rendering
  • No metadata or chapter cues

✓ A real Whipscribe WebVTT

Spec-compliant header, period timing, cue styling and positioning available. Renders identically across HTML5 browsers and YouTube native upload. Free for the first 30 minutes a day; paid plans unlock longer files and styled cue exports.

  • WEBVTT header with period-separated timing
  • Cue styling, positioning, and metadata
  • Word-level cue timing
  • Speaker labels and karaoke-ready highlights
  • HTML5 <track> drop-in

Sample VTT

What the output looks like.

WebVTT-compliant cues with speaker labels and word-level timing.

transcript · whipscribe.com/view/vtt-generator
HOST 00:00:01.200 Welcome back to the show.
GUEST 00:00:03.450 Thanks for having me — long-time listener.
HOST 00:00:06.100 Let's start with the question everyone asks first.
GUEST 00:00:09.300 How did I end up building this in my garage.

Export

One transcript. Five clean formats.

Every paid tier exports all five. The free tier exports TXT and SRT.

.vtt

WebVTT

HTML5 player + YouTube uploads.

.srt

SRT captions

Word-level. Every video editor reads this.

.txt

Plain text

De-ummed paragraphs. Ready to paste.

.docx

Show notes

Formatted with chapters and pull-quotes.

.json

Machine-readable

Per-word timing + speaker IDs.

Pricing

Honest pricing, no surprises.

Credits never expire. Upgrade or downgrade any month. Free tier resets daily — no signup, no card.

Free

$0/forever

Try every feature for 30 minutes a day. No card.

  • 30 min / day
  • Speaker labels included
  • TXT + SRT export
  • No history retention
Try free

Pay-as-you-go

$1/hour

Best for one-off projects. Credits never expire.

  • $10 minimum top-up
  • Every export format
  • 365-day history
  • API access
Top up

Pro

$8/month

Indie creators. 100 hours / month, all features.

  • 100 hours / month
  • Clips + every aspect ratio
  • Branded captions
  • Priority queue
See Pro

Team

$29/month

Teams. 500 hours / month, shared workspace.

  • 500 hours / month
  • Shared library
  • API + MCP for Claude
  • Workspace billing
See Team

FAQ

WebVTT generator questions, answered.

What's the difference between WebVTT and SRT?

WebVTT (.vtt) is the HTML5 standard — period-separated timing (00:00:01.200), supports cue styling, positioning, and metadata. SRT (.srt) is older, simpler — comma timing (00:00:01,200), no styling. Use VTT for HTML5 <track>, YouTube native upload, and styled captions. Use SRT for editors like Premiere, Final Cut, and Resolve.

Does YouTube accept VTT?

Yes. YouTube Studio's caption upload accepts both VTT and SRT. VTT is preferred when you need cue styling or chapter cues; SRT is fine when you just need plain captions.

Can I style the captions?

Yes. WebVTT supports cue settings (line, position, align, vertical) plus inline styling tags (<b>, <i>, <c.classname>) and optional CSS via ::cue pseudo-element. Our exports include style hooks; you write the CSS in your player. Branded burn-in styling is a paid feature on Pro and Team.

Which players support styled VTT?

Native HTML5 <video> with <track kind='subtitles'> in Safari, Chrome, Firefox, Edge. Video.js and Plyr render VTT styling. YouTube renders most cue settings but ignores ::cue CSS. Older players (jwplayer 6, flowplayer) may downgrade to plain text.

How accurate is the timing?

Word-level timing is typically within 50–100ms of the audio truth on clean recordings. Heavy background music, overlapping speech, or thick accents widen the window. Edit individual cues in our editor before exporting if needed.

Is the file stored anywhere?

On the free tier nothing is retained — the VTT is generated, served to your browser, and dropped. On paid tiers files live in your private library for 365 days; you can delete any item from your account settings.

Related

Related tools and pages.

Generate WebVTT captions. Drop them into your HTML5 player.

Generate VTT

Operated by Neugence Technology Pvt. Ltd. · contact@neugence.ai · Security · Privacy · Terms