IEMOCAP

by USC SAIL Lab

12h dyadic emotional speech corpus — the gold-standard SER benchmark.

TL;DR

12h dyadic emotional speech corpus — the gold-standard SER benchmark.

Best for speech emotion recognition (SER) with 5-class + dimensional labels (valence/arousal/dominance). Pricing: research-only.

Category
Open source
License
Stars
Last push
Pricing
research-only
Platforms
Web

What it is

IEMOCAP (Interactive Emotional Dyadic Motion Capture) is 12 hours of scripted + improvised emotional speech from 10 actors. The de-facto SER benchmark. Research-only license.

Best for: Speech emotion recognition (SER) with 5-class + dimensional labels (valence/arousal/dominance).
Watch out for: USC research license · request form · NON-COMMERCIAL · 10 actors in dyadic sessions · scripted + improvised. Cite: Busso et al., LREC 2008.

Install / use

https://sail.usc.edu/iemocap/  # registration form required

Features

Speaker diarizationYes
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

IEMOCAP vs Whipscribe

FeatureIEMOCAPWhipscribe
CategoryOpen sourceTranscription APIs
Pricingresearch-onlyfree beta
Speaker diarizationYesYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsWebWeb, API, MCP

Alternatives to IEMOCAP

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.