Real-time NVIDIA GPU dashboard
How to optimize some algorithm in cuda
High-performance library for gradient boosting on decision trees
157 models, 30 providers, one command to find what runs on hardware
OpenLIT is an open-source LLM Observability tool
Running a big model on a small laptop
High-speed Large Language Model Serving for Local Deployment
Fast and memory-efficient exact attention
Run serverless GPU workloads with fast cold starts on bare-metal
A high-performance inference engine for AI models
GPU accelerated decision optimization
Relax! Flux is the ML library that doesn't make you tensor
Python inference and LoRA trainer package for the LTX-2 audio–video
Performance-optimized AI inference on your GPUs
Open-source Agent Operating System
State-of-the-art Parameter-Efficient Fine-Tuning
HeavyDB (formerly MapD/OmniSciDB)
Supercharge Your LLM with the Fastest KV Cache Layer
The Modular Platform (includes MAX & Mojo)
Pruna is a model optimization framework built for developers
UCCL is an efficient communication library for GPUs
Unified KV Cache Compression Methods for Auto-Regressive Models
Large Language Model Text Generation Inference
High-performance, multiplayer code editor from the creators of Atom
Flux 2 image generation model pure C inference