Showing 25 open source projects for "gpu hardware"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    GPU Puzzles

    GPU Puzzles

    Solve puzzles. Learn CUDA

    GPU Puzzles is an educational project designed to teach GPU programming concepts through interactive coding exercises and puzzles. Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FlexLLMGen

    FlexLLMGen

    Running large language models on a single GPU

    FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    ...HeavyDB was originally developed as part of the OmniSci platform (formerly MapD) and is commonly used for large-scale analytics and geospatial data processing. The database compiles queries into optimized machine code that executes efficiently on GPU hardware, significantly accelerating analytical workloads. It supports hybrid deployment environments where queries can run on both CPU and GPU architectures depending on the available resources.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    SkyPilot

    SkyPilot

    SkyPilot: Run AI and batch jobs on any infra

    SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 5
    Humanoid-Gym

    Humanoid-Gym

    Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

    Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. The framework emphasizes the concept of zero-shot sim-to-real transfer, meaning that behaviors learned in simulation can be deployed directly on physical robots with minimal adjustment. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Colossal-AI

    Colossal-AI

    Making large AI models cheaper, faster and more accessible

    The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    shimmy

    shimmy

    Python-free Rust inference server

    ...It supports modern model formats such as GGUF and SafeTensors and can automatically discover models stored locally or in common directories used by other AI tools. Advanced capabilities include CPU offloading for Mixture-of-Experts models and GPU acceleration, enabling large models to run on consumer hardware with limited VRAM.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Model Zoo

    Model Zoo

    Please do not feed the models

    ...Each model is organized into its own project folder with pinned package versions, ensuring reproducibility and stability. The examples serve both as educational tools for learning Flux and as practical starting points for building new models. GPU acceleration is supported for most models through CUDA integration, enabling efficient training on compatible hardware. With community contributions encouraged, the Model Zoo acts as a hub for sharing and exploring diverse machine learning applications in Julia.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    YOLOv5

    YOLOv5

    YOLOv5 is the world's most loved vision AI

    Introducing Ultralytics YOLOv8, the latest version of the acclaimed real-time object detection and image segmentation model. YOLOv8 is built on cutting-edge advancements in deep learning and computer vision, offering unparalleled performance in terms of speed and accuracy. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. Explore the YOLOv8 Docs, a comprehensive resource designed to help...
    Downloads: 54 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    tvm

    tvm

    Open deep learning compiler stack for cpu, gpu, etc.

    Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    mosaicml composer

    mosaicml composer

    Supercharge Your Model Training

    composer is a deep learning training framework built on PyTorch and designed to make large-scale model training more efficient, scalable, and customizable. At the center of the project is a highly optimized Trainer abstraction that simplifies the management of training loops, parallelization, metrics, logging, and data loading. The framework is intended for modern workloads that may span anything from a single GPU to very large distributed training environments, which makes it suitable for...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    ANE Training

    ANE Training

    Training neural networks on Apple Neural Engine via APIs

    ...It is primarily intended as a research and educational proof of concept rather than a production library, highlighting what is technically possible with undocumented hardware access.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    Intel Extension for PyTorch

    Intel Extension for PyTorch

    A Python package for extending the official PyTorch

    Intel® Extension for PyTorch* extends PyTorch* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel Xe Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    OpenVINO

    OpenVINO

    OpenVINO™ Toolkit repository

    OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime,...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 15
    Diffrax

    Diffrax

    Numerical differential equation solvers in JAX

    Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MegEngine

    MegEngine

    Easy-to-use deep learning framework with 3 key features

    MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 18
    TensorFlow Probability

    TensorFlow Probability

    Probabilistic reasoning and statistical analysis in TensorFlow

    TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions. Since TFP inherits the benefits of TensorFlow, you can build, fit, and deploy a model using a single language throughout the lifecycle of model exploration and production. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Deep Java Library (DJL)

    Deep Java Library (DJL)

    An engine-agnostic deep learning framework in Java

    ...Because DJL is deep learning engine agnostic, you don't have to make a choice between engines when creating your projects. You can switch engines at any point. To ensure the best performance, DJL also provides automatic CPU/GPU choice based on hardware configuration.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    Bullet Physics SDK

    Bullet Physics SDK

    Real-time collision detection and multi-physics simulation for VR

    This is the official C++ source code repository of the Bullet Physics SDK: real-time collision detection and multi-physics simulation for VR, games, visual effects, robotics, machine learning etc. We are developing a new differentiable simulator for robotics learning, called Tiny Differentiable Simulator, or TDS. The simulator allows for hybrid simulation with neural networks. It allows different automatic differentiation backends, for forward and reverse mode gradients. TDS can be trained...
    Downloads: 17 This Week
    Last Update:
    See Project
  • 21
    YOLOv4-large

    YOLOv4-large

    Scaled-YOLOv4: Scaling Cross Stage Partial Network

    YOLOv4-large is an open-source implementation of the Scaled-YOLOv4 object detection architecture, designed to improve both the accuracy and scalability of real-time computer vision models. The project provides a PyTorch implementation of the Scaled-YOLOv4 framework, which extends the original YOLOv4 architecture using Cross Stage Partial (CSP) networks and new scaling techniques. Unlike earlier object detection systems that only scale depth or width, this architecture scales multiple aspects...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    TensorLayer

    TensorLayer

    Deep learning and reinforcement learning library for scientists

    ...This project can also be found at OpenI and Gitee. 3.0.0 has been pre-released, the current version supports TensorFlow, MindSpore and PaddlePaddle (partial) as the backends, allowing users to run the code on different hardware like Nvidia-GPU and Huawei-Ascend. In the future, it will support TensorFlow, MindSpore, PaddlePaddle, PyTorch and other backends. TensorLayer has a high-level layer/model abstraction which is effortless to learn. You can learn how deep learning can benefit your AI tasks in minutes through the massive examples.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    BytePS

    BytePS

    A high performance and generic framework for distributed DNN training

    ...We show our experiment on BERT-large training, which is based on GluonNLP toolkit. The model uses mixed precision. We use Tesla V100 32GB GPUs and set batch size equal to 64 per GPU. Each machine has 8 V100 GPUs (32GB memory) with NVLink-enabled. Machines are inter-connected with 100 Gbps RDMA network. This is the same hardware setup you can get on AWS.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Intel neon

    Intel neon

    Intel® Nervana™ reference deep learning framework

    ...The gpu backend is selected by default, so the above command is equivalent to if a compatible GPU resource is found on the system. The Intel Math Kernel Library takes advantages of the parallelization and vectorization capabilities of Intel Xeon and Xeon Phi systems. When hyperthreading is enabled on the system, we recommend the following KMP_AFFINITY setting to make sure parallel threads are 1:1 mapped to the available physical cores.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    NNVM

    NNVM

    Open deep learning compiler stack for cpu, gpu

    The vision of the Apache NNVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging machine learning models for any hardware platform. Compilation of deep learning models into minimum deployable modules. Infrastructure to automatically generates and optimize models on more backend with better performance....
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB