Showing 160 open source projects for "gpu hardware"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go from Code to Production URL in Seconds Icon
    Go from Code to Production URL in Seconds

    Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

    Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.
    Try it free
  • 1
    Lemonade

    Lemonade

    Lemonade helps users run local LLMs with the highest performance

    Lemonade is a local LLM runtime that aims to deliver the highest possible performance on your own hardware by auto-configuring state-of-the-art inference engines for both NPUs and GPUs. The project positions itself as a “local LLM server” you can run on laptops and workstations, abstracting away backend differences while giving you a single place to serve and manage models. Its README emphasizes real-world adoption across startups, research groups, and large companies, signaling a focus on...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    mosaicml composer

    mosaicml composer

    Supercharge Your Model Training

    composer is a deep learning training framework built on PyTorch and designed to make large-scale model training more efficient, scalable, and customizable. At the center of the project is a highly optimized Trainer abstraction that simplifies the management of training loops, parallelization, metrics, logging, and data loading. The framework is intended for modern workloads that may span anything from a single GPU to very large distributed training environments, which makes it suitable for...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime,...
    Downloads: 22 This Week
    Last Update:
    See Project
  • 4
    gpt-oss

    gpt-oss

    gpt-oss-120b and gpt-oss-20b are two open-weight language models

    gpt-oss is OpenAI’s open-weight family of large language models designed for powerful reasoning, agentic workflows, and versatile developer use cases. The series includes two main models: gpt-oss-120b, a 117-billion parameter model optimized for general-purpose, high-reasoning tasks that can run on a single H100 GPU, and gpt-oss-20b, a lighter 21-billion parameter model ideal for low-latency or specialized applications on smaller hardware. Both models use a native MXFP4 quantization for efficient memory use and support OpenAI’s Harmony response format, enabling transparent full chain-of-thought reasoning and advanced tool integrations such as function calling, browsing, and Python code execution. ...
    Downloads: 10 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    powerMAX

    powerMAX

    powerMAX is a CPU and GPU burn-in test

    powerMAX is a CPU and GPU burn-in tool designed to push your hardware to its absolute thermal and power limits. It helps users uncover stability issues, cooling weaknesses, and power delivery problems by applying maximum, sustained stress to both the processor and graphics card. The utility supports dedicated CPU tests—SSE or AVX—and a demanding GPU 3D rendering test, with the option to run both simultaneously for full-system power load evaluation.
    Downloads: 25 This Week
    Last Update:
    See Project
  • 6
    Superposition Benchmark (Unigine)

    Superposition Benchmark (Unigine)

    GPU benchmark testing graphics performance with realistic 3D scenes.

    ...Widely used by gamers and hardware reviewers, it is proprietary but offers a free edition.
    Downloads: 97 This Week
    Last Update:
    See Project
  • 7
    Unsloth-MLX

    Unsloth-MLX

    Bringing the Unsloth experience to Mac users via Apple's MLX framework

    ...This project removes traditional barriers that prevent Mac users from prototyping and experimenting with LLM training locally by allowing the same code used in cloud GPU environments to run on M-series hardware, improving workflow continuity and reducing iteration costs. It supports loading and training Hugging Face models with fine-tuning strategies like SFT, DPO, ORPO, and GRPO and even handles exporting models to formats like GGUF for downstream use, although some limitations apply with quantized models. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    CUDA-QX

    CUDA-QX

    Accelerated libraries for quantum-classical computing built on CUDA-Q

    CUDA-QX is a collection of accelerated libraries built on top of the CUDA-Q platform, designed to enable rapid development of hybrid quantum-classical applications. It extends the CUDA-Q programming model by providing optimized implementations of domain-specific quantum computing primitives and workflows. The libraries are intended to help researchers and developers leverage GPUs, CPUs, and quantum processing units together in a unified computational model. CUDA-QX focuses on key areas such...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    JAX Toolbox

    JAX Toolbox

    Public CI, Docker images for popular JAX libraries

    JAX Toolbox is a development toolkit designed to streamline and optimize the use of JAX for machine learning and high-performance computing on NVIDIA GPUs. It provides prebuilt Docker images, continuous integration pipelines, and optimized example implementations that help developers quickly set up and run JAX workloads without complex configuration. The project supports popular JAX-based frameworks and models, including architectures used for large-scale pretraining such as GPT and LLaMA...
    Downloads: 0 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 10
    TensorRT LLM

    TensorRT LLM

    TensorRT LLM provides users with an easy-to-use Python API

    TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    Diffrax

    Diffrax

    Numerical differential equation solvers in JAX

    Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    wllama

    wllama

    WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

    wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. The framework provides both high-level APIs for common tasks such as text generation and embeddings, as well as low-level APIs that expose tokenization, sampling controls, and model state management.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    CUDA Agent

    CUDA Agent

    Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

    CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback. Its architecture combines large-scale data synthesis, a skill-augmented CUDA development environment, and long-horizon reinforcement learning to build intrinsic optimization capability rather than relying on simple post-hoc tuning. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    EPLB

    EPLB

    Expert Parallelism Load Balancer

    EPLB is DeepSeek’s open implementation of a load balancing algorithm designed for expert parallelism (EP) settings in MoE architectures. In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    WebLLM

    WebLLM

    Bringing large-language models and chat to web browsers

    WebLLM is a modular, customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration. WebLLM offers a minimalist and modular interface to access the chatbot in the browser. The WebLLM package itself does not come with UI, and is designed in a modular way to hook to any of the UI components. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    CosyVoice

    CosyVoice

    Multi-lingual large voice generation model, providing inference

    CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    MaxText

    MaxText

    A simple, performant and scalable Jax LLM

    MaxText is a high-performance, highly scalable open-source framework designed to train and fine-tune large language models using the JAX ecosystem. The project acts as both a reference implementation and a practical training library that demonstrates best practices for building and scaling transformer-based language models on modern accelerator hardware. It is optimized to run efficiently on Google Cloud TPUs and GPUs, enabling researchers and engineers to train models ranging from small...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    TensorFlow Probability

    TensorFlow Probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions. Since TFP inherits the benefits of TensorFlow, you can build, fit, and deploy a model using a single language throughout the lifecycle of model exploration and production. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    local-llm

    local-llm

    Run LLMs locally on Cloud Workstations

    local-llm is a development framework that enables developers to run large language models locally within Google Cloud Workstations or standard environments without requiring GPU hardware. It focuses on making generative AI development more accessible by leveraging quantized models and CPU-based execution, eliminating the dependency on expensive GPU infrastructure. The repository includes tools, Docker configurations, and command-line utilities that simplify the process of downloading, running, and interacting with language models directly on local or cloud-based workstations. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    VCClient

    VCClient

    Software that uses AI to perform real-time voice conversion

    ...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.
    Downloads: 12 This Week
    Last Update:
    See Project
  • 23
    MSI Kombustor

    MSI Kombustor

    Advanced OpenGL and Vulkan graphics card stress testing utility

    ...The tool provides MSI users with an exclusive, streamlined interface for testing their hardware safely and effectively. By driving high temperatures and peak loads, it reveals whether a graphics card can sustain extended heavy usage. Kombustor is ideal for anyone looking to test, validate, or tune their GPU setup.
    Downloads: 79 This Week
    Last Update:
    See Project
  • 24
    Bottleneck Calculator

    Bottleneck Calculator

    Check CPU and GPU balance with real time bottleneck analysis

    PC Bottleneck Calculator is a performance analysis tool that helps PC gamers and builders identify CPU or GPU bottlenecks in their systems. It provides accurate compatibility insights by comparing hardware data and real world benchmarks to estimate system balance. Users can instantly see how well their CPU and GPU pair together, test different configurations, and understand which component limits their gaming performance. www.pcbottleneckcalculator.io Built with a clean, responsive interface, the tool offers quick, data-driven results without requiring downloads or complex setup.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25

    Knema - Frame Continuity Engine

    Knema is a lightweight real-time performance & frame continuity engine

    ... 🔹 Adaptive Frametime Control Continuously analyzes frametime distribution (mean, p95, jitter) Prioritizes stable frame pacing over artificial FPS boosting Reduces micro-stutter and sudden frame spikes 🔹 GPU-Aware Decision Engine Accurately detects GPU-bound, CPU-bound, and engine-wait scenarios Differentiates real GPU bottlenecks from telemetry glitches Prevents false performance corrections 🔹 Intelligent FPS & Power Management Dynamically adjusts FPS caps based on real hardware limits Reduces unnecessary GPU power consumption in stable scenes Avoids aggressive throttling that causes oscillation or jitter 🔹 Real-Time Probing System Actively tests GPU headroom instead of relying on assumptions Safely probes performance limits without destabilizing gameplay Automatically backs off when physical limits
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB