AMI Meeting Corpus

by Idiap / Edinburgh / Brno

100h multi-microphone meeting recordings with diarization + speaker labels.

TL;DR

100h multi-microphone meeting recordings with diarization + speaker labels.

Best for speaker diarization, meeting transcription, overlapping-speech ASR benchmarks. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Web, HuggingFace

What it is

The AMI Meeting Corpus is 100 hours of 4-person scenario meetings with synchronized multi-microphone audio, video, and rich annotations (diarization, dialogue acts, head-pose). The canonical meeting-ASR benchmark. License: CC BY 4.0.

Best for: Speaker diarization, meeting transcription, overlapping-speech ASR benchmarks.
Watch out for: CC BY 4.0 · 100h meeting audio · 4-speaker scenarios · IHM (close-talking mic) vs SDM (single distant mic) vs MDM (multi-distant) splits. Cite: Carletta et al., MLMI 2005.

Install / use

from datasets import load_dataset; ds = load_dataset('edinburghcstr/ami', 'ihm')

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

AMI Meeting Corpus vs Whipscribe

FeatureAMI Meeting CorpusWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNoNo
Languages199
PlatformsWeb, HuggingFaceWeb, API, MCP

Alternatives to AMI Meeting Corpus

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.