TokenSpeed is a speed-of-light LLM inference engine
A nearly-live implementation of OpenAI's Whisper
Machine Learning Containers for NVIDIA Jetson and JetPack-L4T
Ultralytics YOLO
Mooncake is the serving platform for Kimi
CS2, Valorant, Fortnite, APEX, every game
Transformer related optimization, including BERT, GPT
Efficient 13B MoE language model with long context and reasoning modes
High-performance MoE model with MLA, MTP, and multilingual reasoning