DIHARD III

by LDC / DIHARD organizers

Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.

TL;DR

Hard diarization-in-the-wild challenge — 11 domains from courtrooms to maps.

Best for state-of-the-art diarization across 11 challenging domains (clinical, broadcast, courtroom, etc.). Pricing: paid.

Category
Open source
License
Stars
Last push
Pricing
paid
Platforms
LDC

What it is

DIHARD III is the third Diarization Hard challenge — 11 domains designed to stress-test diarization, including clinical interviews, courtrooms, maps task, and YouTube. License: LDC paid.

Best for: State-of-the-art diarization across 11 challenging domains (clinical, broadcast, courtroom, etc.).
Watch out for: LDC license · paid for non-members · 11 conversation domains. Cite: Ryant et al., Interspeech 2021.

Install / use

https://catalog.ldc.upenn.edu/LDC2022S14  # LDC membership

Features

Speaker diarizationYes
Word-level timestampsNo
Streaming / real-timeNo
Languages supported1
HIPAA eligibleNo

DIHARD III vs Whipscribe

FeatureDIHARD IIIWhipscribe
CategoryOpen sourceTranscription APIs
Pricingpaidfree beta
Speaker diarizationYesYes
Word timestampsNoYes
StreamingNoNo
Languages199
PlatformsLDCWeb, API, MCP

Alternatives to DIHARD III

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.