Showing 13 open source projects for "distributed shared memory"

View related business solutions
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Qwen

    Qwen

    The official repo of Qwen chat & pretrained large language model

    Qwen is a series of large language models developed by Alibaba Cloud, consisting of various pretrained versions like Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B. These models, which range from smaller to larger configurations, are designed for a wide range of natural language processing tasks. They are openly available for research and commercial use, with Qwen's code and model weights shared on GitHub. Qwen's capabilities include text generation, comprehension, and conversation, making it a...
    Downloads: 14 This Week
    Last Update:
    See Project
  • 2
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    Mooncake

    Mooncake

    Mooncake is the serving platform for Kimi

    ...Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    METATRON

    METATRON

    AI-powered penetration testing assistant using local LLM on linux

    METATRON is a multi-agent AI orchestration framework designed to coordinate complex workflows across multiple intelligent agents. It provides a structured system for task delegation, communication, and collaboration between agents. The framework emphasizes scalability, allowing multiple agents to work together on large or complex problems. It includes mechanisms for managing context, memory, and execution flow across tasks. METATRON is particularly useful for building advanced AI systems...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    GPU Hot

    GPU Hot

    Real-time NVIDIA GPU dashboard

    ...The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Xtuner

    Xtuner

    A Next-Generation Training Engine Built for Ultra-Large MoE Models

    ...Its architecture incorporates memory-efficient optimizations that allow researchers to train large models even when computational resources are limited. XTuner is also designed to integrate with modern AI ecosystems, supporting multimodal training, reinforcement learning optimization, and instruction tuning pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Ludwig AI

    Ludwig AI

    Low-code framework for building custom LLMs, neural networks

    ...Support for multi-task and multi-modality learning. Comprehensive config validation detects invalid parameter combinations and prevents runtime failures. Automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. Retain full control of your models down to the activation functions. Support for hyperparameter optimization, explainability, and rich metric visualizations. Experiment with different model architectures, tasks, features, and modalities with just a few parameter changes in the config. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    ...Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. It also supports OCR for images and scanned documents through Tesseract, making it useful for document ingestion pipelines that include image-based or scanned inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    super-agent-party

    super-agent-party

    All-in-one AI companion! Desktop girlfriend + virtual streamer

    Super Agent Party is an open-source experimental framework designed to demonstrate collaborative multi-agent AI systems interacting within a shared environment. The project explores how multiple specialized AI agents can coordinate to solve complex tasks by communicating with each other and sharing information. Instead of relying on a single monolithic model, the framework organizes agents with different roles or capabilities that cooperate to achieve goals. Each agent may handle different...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    MobileLLM

    MobileLLM

    MobileLLM Optimizing Sub-billion Parameter Language Models

    MobileLLM is a lightweight large language model (LLM) framework developed by Facebook Research, optimized for on-device deployment where computational and memory efficiency are critical. Introduced in the ICML 2024 paper “MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases”, it focuses on delivering strong reasoning and generalization capabilities in models under one billion parameters. The framework integrates several architectural innovations—SwiGLU...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    Chitu

    Chitu

    High-performance inference framework for large language models

    ...It supports heterogeneous computing environments, including CPUs, GPUs, and various specialized AI accelerators, allowing models to run across a wide range of infrastructure configurations. Chitu is designed to scale from small single-machine deployments to large distributed clusters that handle high volumes of concurrent inference requests. The system also includes performance optimizations for large models, including support for quantized formats and efficient computation operators that reduce memory usage and latency. Its architecture aims to support enterprise adoption by ensuring stable long-term operation under production workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Punica

    Punica

    Serving multiple LoRA finetuned LLM as one

    Punica is a system designed to efficiently serve multiple LoRA-fine-tuned large language models within a shared GPU environment. LoRA is a parameter-efficient fine-tuning method that allows developers to adapt large pretrained models to specific tasks by adding lightweight adapter layers rather than retraining the entire model. Punica introduces a serving architecture that allows multiple LoRA adapters to share the same base model during inference, significantly reducing memory consumption and computational overhead. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB