Real-time NVIDIA GPU dashboard
How to optimize some algorithm in cuda
157 models, 30 providers, one command to find what runs on hardware
Running a big model on a small laptop
A high-performance inference engine for AI models
Python inference and LoRA trainer package for the LTX-2 audio–video
HeavyDB (formerly MapD/OmniSciDB)
UCCL is an efficient communication library for GPUs
Unified KV Cache Compression Methods for Auto-Regressive Models
Alibaba's high-performance LLM inference engine for diverse apps
Easily compute clip embeddings and build a clip retrieval system
FlashMLA: Efficient Multi-head Latent Attention Kernels
RL implementations
ChatGLM-6B: An Open Bilingual Dialogue Language Model
Distributed AI Model Training and LLM Fine-Tuning on Kubernetes
Run Local LLMs on Any Device. Open-source
A nearly-live implementation of OpenAI's Whisper
Large-Scale Agentic RL for High-Performance CUDA Kernel Generation
Python-free Rust inference server
Training neural networks on Apple Neural Engine via APIs
Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real
High-Performance Face Recognition Library on PaddlePaddle & PyTorch
Reference PyTorch implementation and models for DINOv3
LightLLM is a Python-based LLM (Large Language Model) inference
An implementation of a deep learning recommendation model (DLRM)