ICSI Meeting Corpus

by ICSI Berkeley

72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.

TL;DR

72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.

Best for diarization and meeting-ASR robustness; predates AMI but still cited. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
LDC, OpenSLR

What it is

The ICSI Meeting Corpus is 72 hours of real research-group meetings at ICSI Berkeley, fully transcribed and diarized. Older sibling to AMI. License: CC BY 4.0 via OpenSLR.

Best for: Diarization and meeting-ASR robustness; predates AMI but still cited.
Watch out for: CC BY 4.0 (audio) · research-meeting domain (non-acted) · variable participant counts. Cite: Janin et al., ICASSP 2003.

Install / use

https://www.openslr.org/106/  # ICSI Meeting Corpus via OpenSLR

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

ICSI Meeting Corpus vs Whipscribe

FeatureICSI Meeting CorpusWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNoNo
Languages199
PlatformsLDC, OpenSLRWeb, API, MCP

Alternatives to ICSI Meeting Corpus

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.