Google Cloud Speaker Diarization

by Google Cloud

Diarization layer for Google Cloud Speech-to-Text v2.

TL;DR

Diarization layer for Google Cloud Speech-to-Text v2.

Best for teams already on GCP Speech v2 who need speaker labels in addition to transcripts. Pricing: per-Speech v2 pricing.

Category
Transcription APIs
License
Stars
Last push
Pricing
per-Speech v2 pricing
Platforms
API

What it is

Google Cloud Speech-to-Text v2 includes a diarization configuration that segments transcripts by speaker. Configuration parameters include min/max speaker count and whether to bias to a fixed number of speakers. Supported on a subset of models — primarily 'long' and 'phone_call'. Useful for meeting and call workflows on GCP. Best fit: teams already on gcp speech v2 who need speaker labels in addition to transcripts. Caveats: diarization quality varies by model selection and audio quality. Pricing as listed: per-Speech v2 pricing. Feature flags from vendor docs: speaker diarization, word-level timestamps, HIPAA-eligible under BAA. Directory tags: commercial-api, hyperscaler. Last vendor-page check: 2026-05-12.

Best for: Teams already on GCP Speech v2 who need speaker labels in addition to transcripts.
Watch out for: Diarization quality varies by model selection and audio quality.

Install / use

GCP Speech v2 RecognitionConfig.diarization_config

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supportedNone
HIPAA eligibleYes

Google Cloud Speaker Diarization vs Whipscribe

FeatureGoogle Cloud Speaker DiarizationWhipscribe
CategoryTranscription APIsTranscription APIs
Pricingper-Speech v2 pricingfree beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNo
Languages99
PlatformsAPIWeb, API, MCP

Alternatives to Google Cloud Speaker Diarization

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.