NVIDIA FasterTransformer

by NVIDIA

Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.

TL;DR

Legacy NVIDIA inference engine — predecessor to TensorRT-LLM.

Best for reference implementations of fused Transformer kernels. Pricing: free.

Category
Open source
License
Stars
Last push
Pricing
free
Platforms
Linux

What it is

NVIDIA's early high-performance Transformer inference codebase. Apache-2.0.

Best for: Reference implementations of fused Transformer kernels.
Watch out for: Largely superseded by TensorRT-LLM.

Install / use

Features

Speaker diarizationNo
Word-level timestampsYes
Streaming / real-timeNo
Languages supported60
HIPAA eligibleNo

NVIDIA FasterTransformer vs Whipscribe

FeatureNVIDIA FasterTransformerWhipscribe
CategoryOpen sourceTranscription APIs
Pricingfreefree beta
Speaker diarizationNoYes
Word timestampsYesYes
StreamingNoNo
Languages6099
PlatformsLinuxWeb, API, MCP

Alternatives to NVIDIA FasterTransformer

Whipscribe is a managed faster-whisper + whisperX service. If you want transcripts without running infrastructure, paste a URL or drop a file in the form below — you'll have a transcript in seconds.