Sieve
Video-AI workflow platform with Whisper-based transcription endpoints.
Video-AI workflow platform with Whisper-based transcription endpoints.
Best for builders composing transcription with face detection, dubbing, eye contact, and other video AI. Pricing: per-second compute (see Sieve pricing).
What it is
Sieve is a developer platform for video and audio AI pipelines, exposing pre-built jobs and a workflow engine. Among its catalog are Whisper-based transcription, speaker diarization, dubbing, lip-sync, eye-contact, and background removal. Sieve is appealing when transcription is one step in a longer video pipeline. Pricing is compute-based per second. Best fit: builders composing transcription with face detection, dubbing, eye contact, and other video ai. Caveats: pay-as-you-go compute pricing; latency variable depending on pipeline cold-start. Pricing as listed: per-second compute (see Sieve pricing). Feature flags from vendor docs: speaker diarization, word-level timestamps. Directory tags: commercial-api, video-ai. Last vendor-page check: 2026-05-12.
Watch out for: Pay-as-you-go compute pricing; latency variable depending on pipeline cold-start.
Install / use
POST https://mango.sievedata.com/v2/push (transcribe pipeline)
Features
| Speaker diarization | Yes |
| Word-level timestamps | Yes |
| Streaming / real-time | No |
| Languages supported | 99 |
| HIPAA eligible | No |
Sieve vs Whipscribe
| Feature | Sieve | Whipscribe |
|---|---|---|
| Category | Transcription APIs | Transcription APIs |
| Pricing | per-second compute (see Sieve pricing) | free beta |
| Speaker diarization | Yes | Yes |
| Word timestamps | Yes | Yes |
| Streaming | — | No |
| Languages | 99 | 99 |
| Platforms | API | Web, API, MCP |
Alternatives to Sieve
Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.