TensorRT LLM provides users with an easy-to-use Python API
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
Enables the best performance on NVIDIA RTX Graphics Cards
ONNX-TensorRT: TensorRT backend for ONNX
The fastest AI powered Aimbot
TokenSpeed is a speed-of-light LLM inference engine
A nearly-live implementation of OpenAI's Whisper
A unified library of SOTA model optimization techniques
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
Ultralytics YOLO
Low-latency REST API for serving text-embeddings
Mooncake is the serving platform for Kimi
OneFlow is a deep learning framework designed to be user-friendly
CS2, Valorant, Fortnite, APEX, every game
Embed images and sentences into fixed-length vectors
A computer vision framework to create and deploy apps in minutes
Video Frame Interpolation & Super Resolution using NVIDIA's TensorRT
Transformer related optimization, including BERT, GPT
A repository of trained models
Guide to deploying deep-learning inference networks
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5
C++ library based on tensorrt integration
CPU/GPU inference server for Hugging Face transformer models