Showing 229 open source projects for "gpu"

View related business solutions
  • Atera all-in-one platform IT management software with AI agents Icon
    Atera all-in-one platform IT management software with AI agents

    Ideal for internal IT departments or managed service providers (MSPs)

    Atera’s AI agents don’t just assist, they act. From detection to resolution, they handle incidents and requests instantly, taking your IT management from automated to autonomous.
    Learn More
  • Vibes don’t ship, Retool does Icon
    Vibes don’t ship, Retool does

    Start from a prompt and build production-ready apps on your data—with security, permissions, and compliance built in.

    Vibe coding tools create cool demos, but Retool helps you build software your company can actually use. Generate internal apps that connect directly to your data—deployed in your cloud with enterprise security from day one. Build dashboards, admin panels, and workflows with granular permissions already in place. Stop prototyping and ship on a platform that actually passes security review.
    Build apps that ship
  • 1
    KServe

    KServe

    Standardized Serverless ML Inference Platform on Kubernetes

    ...It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 2
    ComfyUI

    ComfyUI

    The most powerful and modular diffusion model GUI, api and backend

    The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission...
    Downloads: 286 This Week
    Last Update:
    See Project
  • 3
    webgl-plot

    webgl-plot

    A high-Performance real-time 2D plotting library based on native WebGL

    ...Unlike traditional canvas or SVG-based charting libraries, webgl-plot is optimized for streaming and dynamic updates, making it ideal for oscilloscope-style data, biomedical signals, or any application where data updates hundreds of times per second. Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    mpv.net

    mpv.net

    mpv.net is a modern media player for Windows that works just like mpv

    mpv.net is a modern desktop media player for Windows based on the popular mpv player. mpv.net is designed to be mpv compatible, almost all mpv features are available because they are all contained in libmpv, this means the official mpv manual applies to mpv.net. mpv focuses on the usage of the command line and the terminal, mpv.net retains the ability to be used from the command line and the terminal and adds a modern Windows GUI on top of it. Video output that is capable of many features...
    Downloads: 84 This Week
    Last Update:
    See Project
  • Grafana: The open and composable observability platform Icon
    Grafana: The open and composable observability platform

    Faster answers, predictable costs, and no lock-in built by the team helping to make observability accessible to anyone.

    Grafana is the open source analytics & monitoring solution for every database.
    Learn More
  • 5
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 6
    Meridian

    Meridian

    Meridian is an MMM framework

    ...The framework provides a robust foundation for constructing in-house MMM pipelines capable of handling both national and geo-level data, with built-in support for calibration using experimental data or prior knowledge. Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Skiko

    Skiko

    Kotlin Multiplatform bindings to Skia

    Skiko is an open-source graphics library from JetBrains that provides lightweight, cross-platform bindings for the Skia graphics engine tailored specifically for Kotlin Multiplatform and Compose applications. It serves as the low-level rendering backbone for Kotlin UI frameworks like Compose for Desktop and Compose for Web, enabling smooth, GPU-accelerated 2D graphics across Windows, macOS, Linux, and other supported targets without writing native code. Skiko abstracts away platform-specific rendering details while exposing Skia’s powerful features such as high-quality text shaping, image filters, path operations, and hardware accelerated canvases, making it ideal for building rich UI components, animations, games, or custom drawing surfaces. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 8
    DearPyGui

    DearPyGui

    Graphical User Interface Toolkit for Python with minimal dependencies

    Dear PyGui is an easy-to-use, dynamic, GPU-Accelerated, cross-platform graphical user interface toolkit(GUI) for Python. It is “built with” Dear ImGui. Features include traditional GUI elements such as buttons, radio buttons, menus, and various methods to create a functional layout. Additionally, DPG has an incredible assortment of dynamic plots, tables, drawings, debuggers, and multiple resource viewers.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Images.jl

    Images.jl

    An image library for Julia

    ...Julia is well-suited to image processing because it is a modern and elegant high-level language that is a pleasure to use, while also allowing you to write "inner loops" that compile to efficient machine code (i.e., it is as fast as C). Julia supports multithreading and, through add-on packages, GPU processing. JuliaImages is a collection of packages specifically focused on image processing. It is not yet as complete as some toolkits for other programming languages, but it has many useful algorithms. It is focused on clean architecture and is designed to unify "machine vision" and "biomedical 3d image processing" communities.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Labra enables you to launch your solutions on Azure, AWS, and Google Cloud quickly and seamlessly—without a single line of code. Icon
    Labra enables you to launch your solutions on Azure, AWS, and Google Cloud quickly and seamlessly—without a single line of code.

    Cloud GTM Without Limits

    Labra is designed for cloud businesses, independent software vendors (ISVs), and channel partners looking to streamline their go-to-market strategies, accelerate product listings, and enhance sales efficiency through AI-powered automation and CRM integration. Additionally, it caters to teams seeking to enhance collaboration with cloud providers and partner ecosystems while maintaining control over their sales processes and optimizing their growth potential
    Learn More
  • 10
    UIforETW

    UIforETW

    User interface for recording and managing ETW traces

    ...It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line incantations. It also manages symbol settings and capture templates, making it much easier to get actionable call stacks on developer machines and CI bots alike. Built-in quality-of-life options hide advanced complexity until you need it while preserving full access to the underlying ETW power. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    FlashMLA

    FlashMLA

    FlashMLA: Efficient Multi-head Latent Attention Kernels

    FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    AWS Deep Learning Containers

    AWS Deep Learning Containers

    A set of Docker images for training and serving models in TensorFlow

    AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 13
    DirectX-Graphics-Samples

    DirectX-Graphics-Samples

    Samples that demonstrate how to build graphics intensive applications

    This repo contains the DirectX 12 Graphics samples that demonstrate how to build graphics-intensive applications for Windows 10. In the Samples directory, you will find samples that attempt to break off specific features and specific usage scenarios into bite-sized chunks. For example, the ExecuteIndirect sample will show you just enough about execute indirect to get started with that feature without diving too deep into multiengine whereas the nBodyGravity sample will delve into multiengine...
    Downloads: 53 This Week
    Last Update:
    See Project
  • 14
    Kintsugi

    Kintsugi

    A tool to automatically resolve Git conflicts

    ...Named after the Japanese art of repair and beauty, Kintsugi embraces imperfect captures and enhances them intelligently, preserving natural detail while reducing noise and artifacts in ways that align with human visual preferences. The toolkit includes both CPU and GPU paths, allowing it to scale from mobile devices to powerful workstations while maintaining real-time or near-real-time responsiveness for interactive editing contexts. Its algorithmic suite is designed to be modular as well, so developers can pick and combine components for tasks like RAW image enhancement, HDR tone management, or aesthetic adjustments with perceptual fidelity.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    DualPipe

    DualPipe

    A bidirectional pipeline parallelism algorithm

    DualPipe is a bidirectional pipeline parallelism algorithm open-sourced by DeepSeek, introduced in their DeepSeek-V3 technical framework. The main goal of DualPipe is to maximize overlap between computation and communication phases during distributed training, thus reducing idle GPU time (i.e. “pipeline bubbles”) and improving cluster efficiency. Traditional pipeline parallelism methods (e.g. 1F1B or staggered pipelining) leave gaps because forward and backward phases can’t fully overlap with communication. DualPipe addresses that by scheduling micro-batches from both ends of the pipeline in a bidirectional fashion—i.e. some micro-batches flow forward while others flow backward—so that computation on one partition can coincide with communication for another.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    wgpu

    wgpu

    Safe and portable GPU abstraction in Rust, implementing WebGPU API

    wgpu is a safe and portable graphics library for Rust based on the WebGPU API. It is suitable for general purpose graphics and compute on the GPU. Applications using wgpu run natively on Vulkan, Metal, DirectX 11/12, and OpenGL ES; and browsers via WebAssembly on WebGPU and WebGL2. Angle is a translation layer from GLES to other backends, developed by Google. We support running our GLES3 backend over it in order to reach platforms with GLES2 or DX11 support, which aren't accessible otherwise. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    MNN

    MNN

    MNN is a blazing fast, lightweight deep learning framework

    ...Android platform, core so size is about 400KB, OpenCL so is about 400KB, Vulkan so is about 400KB. Supports hybrid computing on multiple devices. Currently supports CPU and GPU.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 18
    RLax

    RLax

    Library of JAX-based building blocks for reinforcement learning agents

    ...It supports both on-policy and off-policy learning, as well as value-based, policy-based, and model-based approaches. RLax is fully JIT-compilable with JAX, enabling high-performance execution across CPU, GPU, and TPU backends. The library implements tools for Bellman equations, return distributions, general value functions, and policy optimization in both continuous and discrete action spaces. It integrates seamlessly with DeepMind’s Haiku (for neural network definition) and Optax (for optimization), making it a key component in modular RL pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Theseus

    Theseus

    A library for differentiable nonlinear optimization

    ...Because solves are differentiable, you can backpropagate through optimization to learn cost weights, feature extractors, or initialization networks end-to-end. The implementation supports batched optimization on GPU, robust losses, damping strategies, and custom factors, making it practical for real-time systems. Helper packages provide geometry primitives and utilities for composing priors, relative constraints, and measurement models. Theseus bridges the gap between classical optimization and deep learning, enabling hybrid systems that learn components.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    The Futhark Programming Language

    The Futhark Programming Language

    A data-parallel functional programming language

    ...It is a statically typed, data-parallel, and purely functional array language in the ML family, and comes with a heavily optimizing ahead-of-time compiler that presently generates either GPU code via CUDA and OpenCL, or multi-threaded CPU code. Futhark is not designed for graphics programming, but can instead use the compute power of the GPU to accelerate data-parallel array computations. The language supports regular nested data-parallelism, as well as a form of imperative-style in-place modification of arrays, while still preserving the purity of the language via the use of a uniqueness type system. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21

    Halide

    A language for fast, portable data-parallel computation

    ...It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. This representation can then be compiled to an object file, or a JIT-compile and run in the same process. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    Tracy Profiler

    Tracy Profiler

    Frame profiler

    ...Tracy supports profiling CPU (Direct support is provided for C, C++, Lua and Python integration. At the same time, third-party bindings to many other languages exist on the internet, such as Rust, Zig, C#, OCaml, Odin, etc.), GPU (All major graphic APIs: OpenGL, Vulkan, Direct3D 11/12, OpenCL.), memory allocations, locks, context switches, automatically attribute screenshots to captured frames, and much more.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 23
    FairChem

    FairChem

    FAIR Chemistry's library of machine learning methods for chemistry

    ...Tasks span heterogeneous domains—catalysis (OC20-style), inorganic materials (OMat), molecules (OMol), MOFs (ODAC), and molecular crystals (OMC)—allowing one model family to serve many simulations. The README provides quick paths for pulling models (e.g., via Hugging Face access), then running energy/force predictions on GPU or CPU.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    Codon

    Codon

    A high-performance, zero-overhead, extensible Python compiler

    Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 25
    GPUPixel

    GPUPixel

    Real-time image and video processing library similar to GPUImage

    GPUPixel is a real-time image and video processing library written in C++11, based on OpenGL/ES. It offers functionalities similar to GPUImage, including built-in beauty filters, enabling efficient processing and rendering of visual effects on images and videos.
    Downloads: 4 This Week
    Last Update:
    See Project