Real-time NVIDIA GPU dashboard
How to optimize some algorithm in cuda
High-performance library for gradient boosting on decision trees
157 models, 30 providers, one command to find what runs on hardware
OpenLIT is an open-source LLM Observability tool
Run serverless GPU workloads with fast cold starts on bare-metal
High-speed Large Language Model Serving for Local Deployment
Fast and memory-efficient exact attention
Running a big model on a small laptop
A high-performance inference engine for AI models
GPU accelerated decision optimization
Relax! Flux is the ML library that doesn't make you tensor
Performance-optimized AI inference on your GPUs
Python inference and LoRA trainer package for the LTX-2 audio–video
Open-source Agent Operating System
HeavyDB (formerly MapD/OmniSciDB)
High-performance, multiplayer code editor from the creators of Atom
UCCL is an efficient communication library for GPUs
Unified KV Cache Compression Methods for Auto-Regressive Models
Large Language Model Text Generation Inference
Flux 2 image generation model pure C inference
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Pruna is a model optimization framework built for developers
State-of-the-art Parameter-Efficient Fine-Tuning
Alibaba's high-performance LLM inference engine for diverse apps