Google Chirp / Chirp 2
Google's universal speech foundation model exposed via Speech-to-Text v2.
Google's universal speech foundation model exposed via Speech-to-Text v2.
Best for teams that need Google's strongest universal multilingual model and can tolerate region constraints. Pricing: per Speech-to-Text v2 pricing (region-tiered).
What it is
Chirp is Google's universal speech model exposed inside Google Cloud Speech-to-Text v2. Chirp 2 is the second generation, trained on additional multilingual data and offering improved accuracy especially for code-switched and accented speech. Chirp models support around 100 languages but feature parity with the legacy 'phone_call', 'video', and 'long' models varies; for example, certain diarization and adaptation features are tied to specific model selections. Chirp models are accessed via the v2 API by configuring a recognizer with the chirp or chirp_2 model identifier.
Watch out for: Region availability is limited; some features (diarization, model adaptation) not available on all Chirp variants.
Install / use
REST: v2 recognizer with model=chirp_2
Features
| Speaker diarization | No |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | 100 |
| HIPAA eligible | Yes |
Google Chirp / Chirp 2 vs Whipscribe
| Feature | Google Chirp / Chirp 2 | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | per Speech-to-Text v2 pricing (region-tiered) | free beta |
| Speaker diarization | — | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | 100 | 99 |
| Platforms | API | Web, API, MCP |
Alternatives to Google Chirp / Chirp 2
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.