Mooncake is an open-source infrastructure platform designed to optimize large language model serving by focusing on efficient management and transfer of model data and KV cache. The platform was originally developed as part of the serving infrastructure for the Kimi large language model system. Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.

Features

  • High-performance transfer engine for moving tensor data across storage layers
  • Distributed KV cache storage for improving LLM inference efficiency
  • Support for RDMA, TCP, and NVMe-over-Fabric data transfer protocols
  • Cluster-level data sharing for checkpoints and intermediate tensors
  • Infrastructure designed for large-scale LLM serving environments
  • Integration with inference frameworks such as vLLM and TensorRT-LLM

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Mooncake

Mooncake Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Mooncake!

Additional Project Details

Programming Language

C++

Related Categories

C++ Large Language Models (LLM)

Registered

2026-03-04