Generate WebVTT captions for HTML5 video and YouTube.
WebVTT is the HTML5 standard for captions. Word-level cues, speaker labels, optional cue styling — pasted straight into a track element or YouTube native upload.
What you get
A WebVTT file that just works in HTML5 video.
Spec-compliant VTT
WEBVTT header, period-separated timing (hh:mm:ss.ms), blank lines between cues, optional cue identifiers. Validates against W3C WebVTT spec — passes the FFmpeg and YouTube native upload checks.
Cue styling and positioning
Optional cue settings — line, position, align, vertical — plus inline styling tags. Place a speaker name top-left, a translation bottom-center, a chapter marker mid-screen.
Word-level timing
Each cue is built from word-level timestamps, not whole-line guesses. Captions snap to the exact word — required for karaoke-style highlight rendering and accurate seeking.
Drop into a track element
Paste the file URL into <track src='captions.vtt' kind='subtitles'> and the browser renders captions natively. Works in Safari, Chrome, Firefox, Edge — no JavaScript player library required.
VTT vs SRT
When to use VTT instead of SRT.
✗ An SRT pasted into HTML5
Some browsers tolerate it, most don't. Comma timing isn't VTT-spec; styling is unsupported; positioning is impossible. Works in VLC, breaks in <track>.
- Comma-separated timing — not VTT spec
- No cue styling support
- No positioning, alignment, or vertical text
- Inconsistent browser rendering
- No metadata or chapter cues
✓ A real Whipscribe WebVTT
Spec-compliant header, period timing, cue styling and positioning available. Renders identically across HTML5 browsers and YouTube native upload. Free for the first 30 minutes a day; paid plans unlock longer files and styled cue exports.
- WEBVTT header with period-separated timing
- Cue styling, positioning, and metadata
- Word-level cue timing
- Speaker labels and karaoke-ready highlights
- HTML5 <track> drop-in
Sample VTT
What the output looks like.
WebVTT-compliant cues with speaker labels and word-level timing.
Export
One transcript. Five clean formats.
Every paid tier exports all five. The free tier exports TXT and SRT.
WebVTT
HTML5 player + YouTube uploads.
SRT captions
Word-level. Every video editor reads this.
Plain text
De-ummed paragraphs. Ready to paste.
Show notes
Formatted with chapters and pull-quotes.
Machine-readable
Per-word timing + speaker IDs.
Pricing
Honest pricing, no surprises.
Credits never expire. Upgrade or downgrade any month. Free tier resets daily — no signup, no card.
Free
$0/forever
Try every feature for 30 minutes a day. No card.
- 30 min / day
- Speaker labels included
- TXT + SRT export
- No history retention
Pay-as-you-go
$1/hour
Best for one-off projects. Credits never expire.
- $10 minimum top-up
- Every export format
- 365-day history
- API access
Pro
$8/month
Indie creators. 100 hours / month, all features.
- 100 hours / month
- Clips + every aspect ratio
- Branded captions
- Priority queue
Team
$29/month
Teams. 500 hours / month, shared workspace.
- 500 hours / month
- Shared library
- API + MCP for Claude
- Workspace billing
FAQ
WebVTT generator questions, answered.
What's the difference between WebVTT and SRT?
WebVTT (.vtt) is the HTML5 standard — period-separated timing (00:00:01.200), supports cue styling, positioning, and metadata. SRT (.srt) is older, simpler — comma timing (00:00:01,200), no styling. Use VTT for HTML5 <track>, YouTube native upload, and styled captions. Use SRT for editors like Premiere, Final Cut, and Resolve.
Does YouTube accept VTT?
Yes. YouTube Studio's caption upload accepts both VTT and SRT. VTT is preferred when you need cue styling or chapter cues; SRT is fine when you just need plain captions.
Can I style the captions?
Yes. WebVTT supports cue settings (line, position, align, vertical) plus inline styling tags (<b>, <i>, <c.classname>) and optional CSS via ::cue pseudo-element. Our exports include style hooks; you write the CSS in your player. Branded burn-in styling is a paid feature on Pro and Team.
Which players support styled VTT?
Native HTML5 <video> with <track kind='subtitles'> in Safari, Chrome, Firefox, Edge. Video.js and Plyr render VTT styling. YouTube renders most cue settings but ignores ::cue CSS. Older players (jwplayer 6, flowplayer) may downgrade to plain text.
How accurate is the timing?
Word-level timing is typically within 50–100ms of the audio truth on clean recordings. Heavy background music, overlapping speech, or thick accents widen the window. Edit individual cues in our editor before exporting if needed.
Is the file stored anywhere?
On the free tier nothing is retained — the VTT is generated, served to your browser, and dropped. On paid tiers files live in your private library for 365 days; you can delete any item from your account settings.
Related
Related tools and pages.
Generate WebVTT captions. Drop them into your HTML5 player.
Generate VTTOperated by Neugence Technology Pvt. Ltd. · contact@neugence.ai · Security · Privacy · Terms