Looking at NVIDIA TensorRT-LLM? Try this first.

Drop your audio. Transcript in seconds. 30 free min, then $2 = 200 min

NVIDIA TensorRT-LLM

Name: NVIDIA TensorRT-LLM
Author: NVIDIA

by NVIDIA

NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.

TL;DR

NVIDIA's optimized inference for Whisper, Canary, Parakeet on Triton.

Best for squeezing the last 30% of throughput out of NVIDIA GPUs for ASR. Pricing: free.

What it is

The premier NVIDIA inference engine; produces engine files via TRT compilation. Apache-2.0.

Best for: Squeezing the last 30% of throughput out of NVIDIA GPUs for ASR.
Watch out for: NVIDIA-only; high engineering overhead; license is Apache-2.0 but requires NGC.

Install / use

git clone https://github.com/NVIDIA/TensorRT-LLM

Features

Speaker diarization	No
Word-level timestamps	Yes
Streaming / real-time	No
Languages supported	60
HIPAA eligible	No

NVIDIA TensorRT-LLM vs Whipscribe

Feature	NVIDIA TensorRT-LLM	Whipscribe
Category	Open source	Transcription APIs
Pricing	free	free beta
Speaker diarization	No	Yes
Word timestamps	Yes	Yes
Streaming	No	No
Languages	60	99
Platforms	Linux, Docker	Web, API, MCP

Alternatives to NVIDIA TensorRT-LLM

OpenAI Whisper

OpenAI

The reference open-source multilingual ASR model from OpenAI.

OSS · MIT ★ 98.1k

whisper.cpp

Georgi Gerganov

C/C++ port of Whisper — runs on anything, from a Raspberry Pi to Apple Silicon.

OSS · MIT ★ 48.8k

faster-whisper

SYSTRAN

4× faster than reference Whisper using CTranslate2 — production sweet spot.

OSS · MIT ★ 22.3k

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.