distributed shared memory free download

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. ...

Downloads: 0 This Week

Last Update: 2025-05-28

See Project

DeepSeed

Deep learning optimization library making distributed training easy

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. DeepSpeed delivers extreme-scale model training for everyone, from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU. Using current generation of GPU clusters with hundreds of devices, 3D parallelism of DeepSpeed can efficiently train deep learning models with trillions of parameters. With just a single GPU,...

Downloads: 0 This Week

Last Update: 2026-05-06

See Project

DGL

Python package built to ease deep learning on graph

Build your models with PyTorch, TensorFlow or Apache MXNet. Fast and memory-efficient message passing primitives for training Graph Neural Networks. Scale to giant graphs via multi-GPU acceleration and distributed training infrastructure. DGL empowers a variety of domain-specific projects including DGL-KE for learning large-scale knowledge graph embeddings, DGL-LifeSci for bioinformatics and cheminformatics, and many others.

Downloads: 1 This Week

Last Update: 2024-08-29

See Project

DLRM

An implementation of a deep learning recommendation model (DLRM)

DLRM (Deep Learning Recommendation Model) is Meta’s open-source reference implementation for large-scale recommendation systems built to handle extremely high-dimensional sparse features and embedding tables. The architecture combines dense (MLP) and sparse (embedding) branches, then interacts features via dot product or feature interactions before passing through further dense layers to predict click-through, ranking scores, or conversion probabilities. The implementation is optimized for...

Downloads: 0 This Week

Last Update: 2026-01-12

See Project

DeepSpeed

Deep learning optimization library: makes distributed training easy

DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme...

Downloads: 1 This Week

Last Update: 2026-05-06

See Project

OneFlow

OneFlow is a deep learning framework designed to be user-friendly

...An extension for OneFlow to target third-party compiler, such as XLA, TensorRT and OpenVINO etc.CUDA runtime is statically linked into OneFlow. OneFlow will work on a minimum supported driver, and any driver beyond. For more information. Distributed performance (efficiency) is the core technical difficulty of the deep learning framework. OneFlow focuses on performance improvement and heterogeneous distributed expansion. It adheres to the core concept and architecture of static compilation and streaming parallelism and solves the memory wall challenge at the cluster level. world-leading level. ...

Downloads: 0 This Week

Last Update: 2024-03-11

See Project

MXNet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning

Apache MXNet is a scalable, efficient open-source deep learning framework—offering a flexible hybrid programming model (symbolic + imperative) and supporting a wide array of languages—designed for training and deploying neural networks across heterogeneous systems. Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic...

Downloads: 1 This Week

Last Update: 2025-08-18

See Project

Apache MXNet (incubating)

A flexible and efficient library for deep learning

Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.

Downloads: 0 This Week

Last Update: 2023-12-13

See Project

PyText

A natural language modeling framework based on PyTorch

...It achieves this by providing simple and extensible interfaces and abstractions for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We use PyText at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale. Distributed-training support built on the new C10d backend in PyTorch 1.0. Mixed precision training support through APEX (trains faster with less GPU memory on NVIDIA Tensor Cores). Extensible components that allows easy creation of new models and tasks.

Downloads: 0 This Week

Last Update: 2021-08-31

See Project

SINGA

A distributed deep learning platform

Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models. Various example deep learning models are provided in SINGA repo on Github and on Google Colab. SINGA supports data parallel training across multiple GPUs (on a single node or across different nodes). SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc.

Downloads: 0 This Week

Last Update: 2022-08-05

See Project

Search Results for "distributed shared memory"

Showing 10 open source projects for "distributed shared memory"

Colossal-AI

DeepSeed

DGL

DLRM

DeepSpeed

OneFlow

MXNet

Apache MXNet (incubating)

PyText

SINGA

Search Results for "distributed shared memory"

Showing 10 open source projects for "distributed shared memory"

Colossal-AI

DeepSeed

DGL

DLRM

DeepSpeed

OneFlow

MXNet

Apache MXNet (incubating)

PyText

SINGA

Related Searches

Related Categories