Showing 10 open source projects for "distributed shared memory"

View related business solutions
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • Forever Free Full-Stack Observability | Grafana Cloud Icon
    Forever Free Full-Stack Observability | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    vLLM

    vLLM

    A high-throughput and memory-efficient inference and serving engine

    vLLM is a fast and easy-to-use library for LLM inference and serving. High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    Mooncake

    Mooncake

    Mooncake is the serving platform for Kimi

    ...Its architecture centers on a high-performance transfer engine that provides unified data transfer across different storage and networking technologies. This engine enables efficient movement of tensors and model data across heterogeneous environments such as GPU memory, system memory, and distributed storage systems. Mooncake also introduces distributed key-value cache storage that allows inference systems to reuse previously computed attention states, significantly improving throughput in large-scale deployments. The system supports advanced networking technologies such as RDMA and NVMe over Fabric, enabling high-speed communication across clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    GPU Hot

    GPU Hot

    Real-time NVIDIA GPU dashboard

    ...The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser. The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.
    Downloads: 1 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 5
    Xtuner

    Xtuner

    A Next-Generation Training Engine Built for Ultra-Large MoE Models

    ...Its architecture incorporates memory-efficient optimizations that allow researchers to train large models even when computational resources are limited. XTuner is also designed to integrate with modern AI ecosystems, supporting multimodal training, reinforcement learning optimization, and instruction tuning pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Ludwig AI

    Ludwig AI

    Low-code framework for building custom LLMs, neural networks

    ...Support for multi-task and multi-modality learning. Comprehensive config validation detects invalid parameter combinations and prevents runtime failures. Automatic batch size selection, distributed training (DDP, DeepSpeed), parameter efficient fine-tuning (PEFT), 4-bit quantization (QLoRA), and larger-than-memory datasets. Retain full control of your models down to the activation functions. Support for hyperparameter optimization, explainability, and rich metric visualizations. Experiment with different model architectures, tasks, features, and modalities with just a few parameter changes in the config. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 7
    Extractous

    Extractous

    Fast and efficient unstructured data extraction

    ...Its purpose is to extract text and metadata efficiently from formats such as PDF, Word, HTML, email archives, images, and more, without depending on external APIs or separate parsing servers. The project emphasizes performance and low memory usage, and its maintainers describe it as a local-first alternative to heavier extraction stacks. For broader format support, the system combines its Rust core with ahead-of-time compiled Apache Tika shared libraries, which allows it to extend parsing coverage while still avoiding traditional server-based overhead. It also supports OCR for images and scanned documents through Tesseract, making it useful for document ingestion pipelines that include image-based or scanned inputs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    super-agent-party

    super-agent-party

    All-in-one AI companion! Desktop girlfriend + virtual streamer

    Super Agent Party is an open-source experimental framework designed to demonstrate collaborative multi-agent AI systems interacting within a shared environment. The project explores how multiple specialized AI agents can coordinate to solve complex tasks by communicating with each other and sharing information. Instead of relying on a single monolithic model, the framework organizes agents with different roles or capabilities that cooperate to achieve goals. Each agent may handle different...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    Chitu

    Chitu

    High-performance inference framework for large language models

    ...It supports heterogeneous computing environments, including CPUs, GPUs, and various specialized AI accelerators, allowing models to run across a wide range of infrastructure configurations. Chitu is designed to scale from small single-machine deployments to large distributed clusters that handle high volumes of concurrent inference requests. The system also includes performance optimizations for large models, including support for quantized formats and efficient computation operators that reduce memory usage and latency. Its architecture aims to support enterprise adoption by ensuring stable long-term operation under production workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 10
    Punica

    Punica

    Serving multiple LoRA finetuned LLM as one

    Punica is a system designed to efficiently serve multiple LoRA-fine-tuned large language models within a shared GPU environment. LoRA is a parameter-efficient fine-tuning method that allows developers to adapt large pretrained models to specific tasks by adding lightweight adapter layers rather than retraining the entire model. Punica introduces a serving architecture that allows multiple LoRA adapters to share the same base model during inference, significantly reducing memory consumption and computational overhead. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB