Sieve

by Sieve

Video-AI workflow platform with Whisper-based transcription endpoints.

TL;DR

Video-AI workflow platform with Whisper-based transcription endpoints.

Best for builders composing transcription with face detection, dubbing, eye contact, and other video AI. Pricing: per-second compute (see Sieve pricing).

Category
Transcription APIs
License
Stars
Last push
Pricing
per-second compute (see Sieve pricing)
Platforms
API

What it is

Sieve is a developer platform for video and audio AI pipelines, exposing pre-built jobs and a workflow engine. Among its catalog are Whisper-based transcription, speaker diarization, dubbing, lip-sync, eye-contact, and background removal. Sieve is appealing when transcription is one step in a longer video pipeline. Pricing is compute-based per second. Best fit: builders composing transcription with face detection, dubbing, eye contact, and other video ai. Caveats: pay-as-you-go compute pricing; latency variable depending on pipeline cold-start. Pricing as listed: per-second compute (see Sieve pricing). Feature flags from vendor docs: speaker diarization, word-level timestamps. Directory tags: commercial-api, video-ai. Last vendor-page check: 2026-05-12.

Best for: Builders composing transcription with face detection, dubbing, eye contact, and other video AI.
Watch out for: Pay-as-you-go compute pricing; latency variable depending on pipeline cold-start.

Install / use

POST https://mango.sievedata.com/v2/push (transcribe pipeline)

Features

Speaker diarizationYes
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

Sieve vs Whipscribe

FeatureSieveWhipscribe
CategoryTranscription APIsTranscription APIs
Pricingper-second compute (see Sieve pricing)free beta
Speaker diarizationYesYes
Word timestampsYesYes
StreamingNo
Languages9999
PlatformsAPIWeb, API, MCP

Alternatives to Sieve

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.