Mooncake is an open-source infrastructure platform designed to optimize large language model serving by focusing on efficient management and transfer of model data and KV cache. The platform was originally developed as part of the serving infrastructure for the Kimi large language model system. Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.

Features

  • High-performance transfer engine for moving tensor data across storage layers
  • Distributed KV cache storage for improving LLM inference efficiency
  • Support for RDMA, TCP, and NVMe-over-Fabric data transfer protocols
  • Cluster-level data sharing for checkpoints and intermediate tensors
  • Infrastructure designed for large-scale LLM serving environments
  • Integration with inference frameworks such as vLLM and TensorRT-LLM

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Mooncake

Mooncake Web Site

Other Useful Business Software
Catch Bugs Before Your Customers Do Icon
Catch Bugs Before Your Customers Do

Real-time error alerts, performance insights, and anomaly detection across your full stack. Free 30-day trial.

Move from alert to fix before users notice. AppSignal monitors errors, performance bottlenecks, host health, and uptime—all from one dashboard. Instant notifications on deployments, anomaly triggers for memory spikes or error surges, and seamless log management. Works out of the box with Rails, Django, Express, Phoenix, Next.js, and dozens more. Starts at $23/month with no hidden fees.
Try AppSignal Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Mooncake!

Additional Project Details

Programming Language

C++

Related Categories

C++ Large Language Models (LLM)

Registered

2026-03-04