TensorRT LLM provides users with an easy-to-use Python API
TokenSpeed is a speed-of-light LLM inference engine
A nearly-live implementation of OpenAI's Whisper
A unified library of SOTA model optimization techniques
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
ChatGLM3 series: Open Bilingual Chat LLMs | Open Source Bilingual Chat
Ultralytics YOLO
The Triton Inference Server provides an optimized cloud
Low-latency REST API for serving text-embeddings
CS2, Valorant, Fortnite, APEX, every game
Embed images and sentences into fixed-length vectors
A computer vision framework to create and deploy apps in minutes
YOLOX is a high-performance anchor-free YOLO, exceeding yolov3~v5
CPU/GPU inference server for Hugging Face transformer models
Tools to help users inter-operate among deep learning frameworks