ICSI Meeting Corpus
by ICSI Berkeley
72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.
TL;DR
72h research-meeting recordings — diarization and meeting-ASR alternative to AMI.
Best for diarization and meeting-ASR robustness; predates AMI but still cited. Pricing: free.
Category
Open source
License
—
Stars
—
Last push
—
Pricing
free
Platforms
LDC, OpenSLR
What it is
The ICSI Meeting Corpus is 72 hours of real research-group meetings at ICSI Berkeley, fully transcribed and diarized. Older sibling to AMI. License: CC BY 4.0 via OpenSLR.
Best for: Diarization and meeting-ASR robustness; predates AMI but still cited.
Watch out for: CC BY 4.0 (audio) · research-meeting domain (non-acted) · variable participant counts. Cite: Janin et al., ICASSP 2003.
Watch out for: CC BY 4.0 (audio) · research-meeting domain (non-acted) · variable participant counts. Cite: Janin et al., ICASSP 2003.
Install / use
https://www.openslr.org/106/ # ICSI Meeting Corpus via OpenSLR
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 1 |
| HIPAA eligible | No |
ICSI Meeting Corpus vs Whipscribe
| Feature | ICSI Meeting Corpus | Whipscribe |
|---|---|---|
| Category | Open source | Transcription APIs |
| Pricing | free | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | No | No |
| Languages | 1 | 99 |
| Platforms | LDC, OpenSLR | Web, API, MCP |
Alternatives to ICSI Meeting Corpus
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.