Naver Clova Speech
Naver Cloud's Korean-first ASR with batch + real-time and speaker diarization.
Naver Cloud's Korean-first ASR with batch + real-time and speaker diarization.
Best for korean-language meeting, broadcast, and contact-center workflows. Pricing: tiered KRW-per-second.
What it is
Naver Cloud's Clova Speech family includes short-utterance recognition (CSR) for under-60-second clips and the longer-format Clova Speech endpoint for full meetings and broadcasts with speaker diarization, word timestamps, and segmentation. Korean is the strongest language; English, Japanese and Mandarin are supported on specific endpoints. Pricing is KRW-denominated with tiered per-second rates. Strong fit for Korean SaaS and broadcast pipelines. Best fit: korean-language meeting, broadcast, and contact-center workflows. Caveats: korean-first; english and japanese supported but quality varies; docs primarily in korean. Pricing as listed: tiered KRW-per-second. Feature flags from vendor docs: speaker diarization, word-level timestamps, streaming. Directory tags: commercial-api, regional-asia. Last vendor-page check: 2026-05-12.
Watch out for: Korean-first; English and Japanese supported but quality varies; docs primarily in Korean.
Install / use
Naver Cloud: Clova Speech Recognition (CSR) / Long Sentence (CLOVA Speech)
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | Yes |
| Languages supported | None |
| HIPAA eligible | No |
Naver Clova Speech vs Whipscribe
| Feature | Naver Clova Speech | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | tiered KRW-per-second | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | Yes | No |
| Languages | — | 99 |
| Platforms | API | Web, API, MCP |
Alternatives to Naver Clova Speech
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.