OpenCLIP

by mlfoundations

Open CLIP — companion vision encoder in multimodal ASR research.

TL;DR

Open CLIP — companion vision encoder in multimodal ASR research.

Best for multimodal (audio-video) ASR research pipelines. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux, macOS, Windows

What it is

An open re-implementation of OpenAI CLIP. MIT.

Best for: Multimodal (audio-video) ASR research pipelines.
Watch out for: Vision side; not transcription.

Install / use

pip install open-clip-torch

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported99
HIPAA eligibleNo

OpenCLIP vs Whipscribe

FeatureOpenCLIPWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages9999
PlatformsLinux, macOS, WindowsWeb, API, MCP

Alternatives to OpenCLIP

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.