Showing 75 open source projects for "distributed shared memory"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 1
    Punica

    Punica

    Serving multiple LoRA finetuned LLM as one

    Punica is a system designed to efficiently serve multiple LoRA-fine-tuned large language models within a shared GPU environment. LoRA is a parameter-efficient fine-tuning method that allows developers to adapt large pretrained models to specific tasks by adding lightweight adapter layers rather than retraining the entire model. Punica introduces a serving architecture that allows multiple LoRA adapters to share the same base model during inference, significantly reducing memory consumption and computational overhead. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Metaseq

    Metaseq

    Repo for external large-scale work

    Metaseq is a flexible, high-performance framework for training and serving large-scale sequence models, such as language models, translation systems, and instruction-tuned LLMs. Built on top of PyTorch, it provides distributed training, model sharding, mixed-precision computation, and memory-efficient checkpointing to support models with hundreds of billions of parameters. The framework was used internally at Meta to train models like OPT (Open Pre-trained Transformer) and serves as a reference implementation for scaling transformer architectures efficiently across GPUs and nodes. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    ParlAI

    ParlAI

    A framework for training and evaluating AI models

    ...The library integrates tightly with PyTorch and supports both generative and retrieval-augmented models, along with utilities for multitask training and model selection. A large set of built-in tasks and dataset loaders (with consistent preprocessing and metrics) makes it easy to compare methods under shared conditions. Tools for distributed training, mixed precision, and model zoos help scale experiments from laptops to multi-GPU clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Mars Framework

    Mars Framework

    Mars is a tensor-based unified framework for large-scale data

    Mars is a distributed computing framework designed to scale scientific computing and data science workloads across large clusters while preserving the familiar programming interfaces of common Python libraries. The project provides a tensor-based execution model that extends the capabilities of tools such as NumPy, pandas, and scikit-learn so that large datasets can be processed in parallel without rewriting code for distributed environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Streamline Azure Security with Palo Alto Networks VM-Series Icon
    Streamline Azure Security with Palo Alto Networks VM-Series

    Centrally manage physical and virtualized firewalls with Panorama

    Improve your security posture and reduce incident response time. Use the VM-Series to natively analyze Azure traffic and dynamically drive policy updates based on workload changes.
    Learn more
  • 5
    MXNet

    MXNet

    Lightweight, Portable, Flexible Distributed/Mobile Deep Learning

    Apache MXNet is a scalable, efficient open-source deep learning framework—offering a flexible hybrid programming model (symbolic + imperative) and supporting a wide array of languages—designed for training and deploying neural networks across heterogeneous systems. Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity. At its core, MXNet contains a dynamic...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    TensorFlowOnSpark

    TensorFlowOnSpark

    TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters

    By combining salient features from the TensorFlow deep learning framework with Apache Spark and Apache Hadoop, TensorFlowOnSpark enables distributed deep learning on a cluster of GPU and CPU servers. It enables both distributed TensorFlow training and inferencing on Spark clusters, with a goal to minimize the amount of code changes required to run existing TensorFlow programs on a shared grid.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Apache MXNet (incubating)

    Apache MXNet (incubating)

    A flexible and efficient library for deep learning

    Apache MXNet is an open source deep learning framework designed for efficient and flexible research prototyping and production. It contains a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations. On top of this is a graph optimization layer, overall making MXNet highly efficient yet still portable, lightweight and scalable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    SimSiam

    SimSiam

    PyTorch implementation of SimSiam

    ...The repository provides scripts for both unsupervised pre-training and linear evaluation, using a ResNet-50 backbone by default. It is compatible with multi-GPU distributed training and can be fine-tuned or transferred to downstream tasks like object detection following the same setup as MoCo.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9

    rt-agents

    Agent system for HPC real time applications

    Agent system for HPC real time applications
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    PyText

    PyText

    A natural language modeling framework based on PyTorch

    ...It achieves this by providing simple and extensible interfaces and abstractions for model components, and by using PyTorch’s capabilities of exporting models for inference via the optimized Caffe2 execution engine. We use PyText at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale. Distributed-training support built on the new C10d backend in PyTorch 1.0. Mixed precision training support through APEX (trains faster with less GPU memory on NVIDIA Tensor Cores). Extensible components that allows easy creation of new models and tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    SINGA

    SINGA

    A distributed deep learning platform

    Apache SINGA is an Apache Top Level Project, focusing on distributed training of deep learning and machine learning models. Various example deep learning models are provided in SINGA repo on Github and on Google Colab. SINGA supports data parallel training across multiple GPUs (on a single node or across different nodes). SINGA supports various popular optimizers including stochastic gradient descent with momentum, Adam, RMSProp, and AdaGrad, etc.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    BytePS

    BytePS

    A high performance and generic framework for distributed DNN training

    BytePS is a high-performance and generally distributed training framework. It supports TensorFlow, Keras, PyTorch, and MXNet, and can run on either TCP or RDMA networks. BytePS outperforms existing open-sourced distributed training frameworks by a large margin. For example, on BERT-large training, BytePS can achieve ~90% scaling efficiency with 256 GPUs (see below), which is much higher than Horovod+NCCL.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Image Super-Resolution (ISR)

    Image Super-Resolution (ISR)

    Super-scale your images and run experiments with Residual Dense

    The goal of this project is to upscale and improve the quality of low-resolution images. This project contains Keras implementations of different Residual Dense Networks for Single Image Super-Resolution (ISR) as well as scripts to train these networks using content and adversarial loss components. Docker scripts and Google Colab notebooks are available to carry training and prediction. Also, we provide scripts to facilitate training on the cloud with AWS and Nvidia-docker with only a few...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    PyTorch-BigGraph

    PyTorch-BigGraph

    Generate embeddings from large-scale graph-structured data

    PyTorch-BigGraph (PBG) is a system for learning embeddings on massive graphs—think billions of nodes and edges—using partitioning and distributed training to keep memory and compute tractable. It shards entities into partitions and buckets edges so that each training pass only touches a small slice of parameters, which drastically reduces peak RAM and enables horizontal scaling across machines. PBG supports multi-relation graphs (knowledge graphs) with relation-specific scoring functions, negative sampling strategies, and typed entities, making it suitable for link prediction and retrieval. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Texar

    Texar

    Toolkit for Machine Learning, Natural Language Processing

    Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar provides a library of easy-to-use ML modules and functionalities for composing whatever models and algorithms. The tool is designed for both researchers and practitioners for fast prototyping and experimentation. Texar was originally developed and is actively contributed by Petuum and CMU in collaboration with other institutes. A mirror of this...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    OpenSeq2Seq

    OpenSeq2Seq

    Toolkit for efficient experimentation with Speech Recognition

    ...Mixed-precision support (float16) is optimized for NVIDIA Volta and Turing GPUs, allowing significant speedups and memory savings without sacrificing model quality. The project comes with configuration-driven training scripts, documentation, and examples that demonstrate how to set up pipelines for tasks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Easy Machine Learning

    Easy Machine Learning

    Easy Machine Learning is a general-purpose dataflow-based system

    Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from being realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves but also the processing for applying them to real applications which often involve multiple steps and different algorithms. Our...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18

    popt4jlib

    Parallel Optimization Library for Java

    ...A fast parallel implementation of the network simplex method, and some full-fledged parallel/distributed MIP solvers will be added in the next version. In general, emphasis is given in improving the efficiency of the algorithms in shared-memory models via java threads, since multi-core machines are so wide-spread today.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    P3: The Portable Unix Programming System

    P3: The Portable Unix Programming System

    Multi-process homeostatic software agent library

    PUPS/P3 facilitates development of multi-process multi-host computations by providing tools to emulate colonies of homeostatic organisms. It permits persistent computation, homeostatic resource protection, and asychronous interprocess communication.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    H2O-3

    H2O-3

    H2O is an Open Source, Distributed, Fast & Scalable Machine Learning

    H2O-3 is an open-source machine learning platform designed to build scalable and distributed machine learning models across large datasets. The system operates as an in-memory computing platform that allows data scientists to train models quickly using distributed resources. It supports many machine learning algorithms including generalized linear models, gradient boosting machines, deep learning networks, and ensemble techniques.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    ANts P2P
    ANts P2P realizes a third generation P2P net. It protects your privacy while you are connected and makes you not trackable, hiding your identity (ip) and crypting everything you are sending/receiving from others.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 22
    XML Processor. A Multi-threaded, Pub/Sub environment for Dynamic programming on an event driven Tickless and Sleeping State Machine with TCP communications, tight flawless memory management, powerful set algebra and a magical database. 100% C++. ezPort.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Blackboard implements a lightweight, portable tuple space suitable for multi-agent system and distributed component design. Supports implicit invocation via content-filtered asynchronous events, blocking call semantics, and shared memory messaging.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    The PSU Mars Rover Software System is a collection of modules connected via shared memory space which allow the operation of various sub-systems to control the rover in all of its tasks; especially navigation.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Grok-2.5

    Grok-2.5

    Large-scale xAI model for local inference with SGLang, Grok-2.5

    Grok-2.5 is a large-scale AI model developed and released by xAI in 2024, made available through Hugging Face for research and experimentation. The model is distributed as raw weights that require specialized infrastructure to run, rather than being hosted by inference providers. To use it, users must download over 500 GB of files and set them up locally with the SGLang inference engine. Grok-2.5 supports advanced inference with multi-GPU configurations, requiring at least 8 GPUs with more than 40 GB of memory each for optimal performance. ...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB