A high-throughput and memory-efficient inference and serving engine
Redundancy-aware KV Cache Compression for Reasoning Models
Cache-Augmented Generation: A Simple, Efficient Alternative to RAG
UCCL is an efficient communication library for GPUs
Unified KV Cache Compression Methods for Auto-Regressive Models
Node-RED ChatGPT
Supercharge Your LLM with the Fastest KV Cache Layer
Graph-vector database for building unified AI backends fast
Mooncake is the serving platform for Kimi
A timeline of the latest AI models for audio generation
Code for machine learning for algorithmic trading, 2nd edition
Image augmentation for machine learning experiments