distributed shared memory free download

Mooncake

Mooncake is the serving platform for Kimi

...Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.

Downloads: 0 This Week

Last Update: 15 hours ago

See Project

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. ...

Downloads: 0 This Week

Last Update: 2026-05-11

See Project

Tiny CUDA Neural Networks

Lightning fast C++/CUDA neural network framework

...We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared memory in its default configuration. It will likely only work on an RTX 3090, an RTX 2080 Ti, or high-end enterprise GPUs. Lower-end cards must reduce the n_neurons parameter or use the CutlassMLP (better compatibility but slower) instead. tiny-cuda-nn comes with a PyTorch extension that allows using the fast MLPs and input encodings from within a Python context. ...

Downloads: 0 This Week

Last Update: 2025-07-08

See Project

OneFlow

OneFlow is a deep learning framework designed to be user-friendly

...An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous distributed expansion. It adheres to the core concept and architecture of static compilation and streaming parallelism and solves the memory wall challenge at the cluster level. world-leading level. ...

Downloads: 0 This Week

Last Update: 2024-03-11

See Project

MXNet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning

Apache MXNet is a scalable, efficient open-source deep learning framework—offering a flexible hybrid programming model (symbolic + imperative) and supporting a wide array of languages—designed for training and deploying neural networks across heterogeneous systems. Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic...

Downloads: 1 This Week

Last Update: 2025-08-18

See Project

SINGA

A distributed deep learning platform

Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models. Various example deep learning models are provided in SINGA repo on Github and on Google Colab. SINGA supports data parallel training across multiple GPUs (on a single node or across different nodes). SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc.

Downloads: 0 This Week

Last Update: 2022-08-05

See Project

Xepl Engine Virtual Machine

XML Processor. A Multi-threaded, Pub/Sub environment for Dynamic programming on an event driven Tickless and Sleeping State Machine with TCP communications, tight flawless memory management, powerful set algebra and a magical database. 100% C++. ezPort.

Downloads: 0 This Week

Last Update: 2013-04-09

See Project

Blackboard messaging library

Blackboard implements a lightweight, portable tuple space suitable for multi-agent system and distributed component design. Supports implicit invocation via content-filtered asynchronous events, blocking call semantics, and shared memory messaging.

Downloads: 0 This Week

Last Update: 2014-03-08

See Project

The PSU Mars Rover Software System

The PSU Mars Rover Software System is a collection of modules connected via shared memory space which allow the operation of various sub-systems to control the rover in all of its tasks; especially navigation.

Downloads: 0 This Week

Last Update: 2013-04-17

See Project

Search Results for "distributed shared memory"

Showing 9 open source projects for "distributed shared memory"

Mooncake

PowerInfer

Tiny CUDA Neural Networks

OneFlow

MXNet

SINGA

Xepl Engine Virtual Machine

Blackboard messaging library

The PSU Mars Rover Software System

Search Results for "distributed shared memory"

Showing 9 open source projects for "distributed shared memory"

Mooncake

PowerInfer

Tiny CUDA Neural Networks

OneFlow

MXNet

SINGA

Xepl Engine Virtual Machine

Blackboard messaging library

The PSU Mars Rover Software System

Related Searches

Related Categories