Google Cloud Speaker Diarization
Diarization layer for Google Cloud Speech-to-Text v2.
Diarization layer for Google Cloud Speech-to-Text v2.
Best for teams already on GCP Speech v2 who need speaker labels in addition to transcripts. Pricing: per-Speech v2 pricing.
What it is
Google Cloud Speech-to-Text v2 includes a diarization configuration that segments transcripts by speaker. Configuration parameters include min/max speaker count and whether to bias to a fixed number of speakers. Supported on a subset of models — primarily 'long' and 'phone_call'. Useful for meeting and call workflows on GCP. Best fit: teams already on gcp speech v2 who need speaker labels in addition to transcripts. Caveats: diarization quality varies by model selection and audio quality. Pricing as listed: per-Speech v2 pricing. Feature flags from vendor docs: speaker diarization, word-level timestamps, HIPAA-eligible under BAA. Directory tags: commercial-api, hyperscaler. Last vendor-page check: 2026-05-12.
Watch out for: Diarization quality varies by model selection and audio quality.
Install / use
GCP Speech v2 RecognitionConfig.diarization_config
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | None |
| HIPAA eligible | Yes |
Google Cloud Speaker Diarization vs Whipscribe
| Feature | Google Cloud Speaker Diarization | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | per-Speech v2 pricing | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | — | No |
| Languages | — | 99 |
| Platforms | API | Web, API, MCP |
Alternatives to Google Cloud Speaker Diarization
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.