Showing 101 open source projects for "gpu max performance"

View related business solutions
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Zoo Design Studio

    Zoo Design Studio

    The Zoo Design Studio app

    ...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    DALI

    DALI

    A GPU-accelerated library containing highly optimized building blocks

    ...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 3
    HLSL++

    HLSL++

    Math library using HLSL syntax with multiplatform SIMD support

    HLSL++ is a header-only C++ math library designed to replicate the syntax and functionality of the HLSL shading language, making it easier for developers to write CPU-side code that mirrors GPU shader logic. It provides vector, matrix, and math operations with a syntax identical or very similar to HLSL, allowing seamless transition between shader code and application code. The library is optimized for performance and supports SIMD instructions across multiple architectures, including SSE, AVX, AVX2, AVX512, and ARM NEON, ensuring high efficiency on modern hardware. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    Floem

    Floem

    A native Rust UI library with fine-grained reactivity

    Floem is a cross-platform GUI framework for Rust. It aims to be extremely performant while providing world-class developer ergonomics. Supporting both GPU and CPU rendering, Floem gives you performance that's closest to bare metal. Also primitives are provided to help the developer to write performant UI code without too much effect.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Our Free Plans just got better! | Auth0 Icon
    Our Free Plans just got better! | Auth0

    With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

    You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
    Try free now
  • 5
    Arcan

    Arcan

    Powerful development framework for creating virtually anything

    ...At its heart lies a robust and portable multimedia engine, with a well-tested and well-documented Lua scripting interface. The development emphasizes security, debuggability and performance, guided by a principle of least surprise in terms of API design. For the main engine there has been quite some refactoring to reduce input latency; better accommodate variable-refresh rate display; prepare for asymmetric uncooperative multi-GPU and GPU handover; explicit synchronization and runtime transitions back and forth between low (16-bit) to standard (32-bit) to high-definition rendering (10-bit + fp16/fp32).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    ChartGPU

    ChartGPU

    Beautiful, open source, WebGPU-based charting library

    The ChartGPU repository is an open-source, WebGPU-based charting library written in TypeScript that enables developers to visualize large datasets with high performance and smooth interactivity even when handling millions of data points. By leveraging WebGPU — the next-generation graphics API for the web — ChartGPU offloads rendering work to the GPU, allowing for fast panning, zooming, and real-time updates with minimal latency. This makes the library particularly valuable for data-intensive dashboards, scientific visualizations, and financial charting where performance bottlenecks of traditional canvas or SVG approaches become apparent. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    FurMark

    FurMark

    GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux

    FurMark is an intensive benchmarking tool designed to evaluate the performance of graphics cards using fur rendering algorithms. This tool is particularly effective in generating high workloads that can significantly increase the temperature of the GPU, making it a useful utility for testing the stability and stress tolerance of graphics cards. By simulating demanding rendering tasks, FurMark serves as a comprehensive test for assessing the robustness and thermal performance of GPUs under extreme conditions. ...
    Downloads: 311 This Week
    Last Update:
    See Project
  • 8
    Face Alignment

    Face Alignment

    2D and 3D Face alignment library build using pytorch

    ...By default, the package will use the SFD face detector. However, the users can alternatively use dlib, BlazeFace, or pre-existing ground truth bounding boxes. While not required, for optimal performance(especially for the detector) it is highly recommended to run the code using a CUDA-enabled GPU. While here the work is presented as a black box, if you want to know more about the intrisecs of the method please check the original paper either on arxiv or my webpage.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    JAX Toolbox

    JAX Toolbox

    Public CI, Docker images for popular JAX libraries

    JAX Toolbox is a development toolkit designed to streamline and optimize the use of JAX for machine learning and high-performance computing on NVIDIA GPUs. It provides prebuilt Docker images, continuous integration pipelines, and optimized example implementations that help developers quickly set up and run JAX workloads without complex configuration. The project supports popular JAX-based frameworks and models, including architectures used for large-scale pretraining such as GPT and LLaMA...
    Downloads: 2 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    ncnn

    ncnn

    High-performance neural network inference framework for mobile

    ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 11
    ArrayFire

    ArrayFire

    ArrayFire, a general purpose GPU library

    ArrayFire is a general-purpose tensor library that simplifies the process of software development for the parallel architectures found in CPUs, GPUs, and other hardware acceleration devices. The library serves users in every technical computing market. Data structures in ArrayFire are smartly managed to avoid costly memory transfers and to take advantage of each performance feature provided by the underlying hardware. The community of ArrayFire developers invites you to build with us if...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 12
    RLax

    RLax

    Library of JAX-based building blocks for reinforcement learning agents

    ...It supports both on-policy and off-policy learning, as well as value-based, policy-based, and model-based approaches. RLax is fully JIT-compilable with JAX, enabling high-performance execution across CPU, GPU, and TPU backends. The library implements tools for Bellman equations, return distributions, general value functions, and policy optimization in both continuous and discrete action spaces. It integrates seamlessly with DeepMind’s Haiku (for neural network definition) and Optax (for optimization), making it a key component in modular RL pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    Halide

    A language for fast, portable data-parallel computation

    Halide is a programming language for fast, portable data-parallel computation. It was designed to make writing high-performance image and array processing code much easier on modern machines. It works on all major operating systems and with several CPU architectures (X86, ARM, MIPS, Hexagon, PowerPC) and GPU Compute APIs (CUDA, OpenCL, OpenGL, among others). It isn't a standalone programming language however; rather it is embedded in C++ which means that you write C++ code, building an in-memory representation of a Halide pipeline using Halide's C++ API. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 14
    GPUPixel

    GPUPixel

    Real-time image and video processing library similar to GPUImage

    GPUPixel is a real-time image and video processing library written in C++11, based on OpenGL/ES. It offers functionalities similar to GPUImage, including built-in beauty filters, enabling efficient processing and rendering of visual effects on images and videos.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    ngx-toastr

    ngx-toastr

    Angular Toastr

    Toast Component Injection without being passed ViewContainerRef. No use of ngFor. Fewer dirty checks and higher performance. AoT compilation and lazy loading compatible. Component inheritance for custom toasts. SystemJS/UMD rollup bundle. Animations using Angular's Web Animations API. Output toasts to an optional target directive. Put toasts in a specific div inside your application. This should probably be somewhere that doesn't get deleted. Add ToastContainerModule to the ngModule where...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    GitHub Actions for DigitalOcean

    GitHub Actions for DigitalOcean

    GitHub Actions for DigitalOcean - doctl

    ...Powerful and production-ready, our cloud platform has the solutions that devs like you need to succeed, whether you're building world-changing AI apps, running a side project, or building a business. GPU solutions for everyone—novice to expert. Run training and inference, process large data sets and complex neural networks, and deploy high-performance computing clusters.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    PyOpenCL

    PyOpenCL

    OpenCL integration for Python, plus shiny features

    PyOpenCL is a Python wrapper for the OpenCL framework, providing seamless access to parallel computing on CPUs, GPUs, and other accelerators. It enables developers to harness the full power of heterogeneous computing directly from Python, combining Python’s ease of use with the performance benefits of OpenCL. PyOpenCL also includes convenient features for managing memory, compiling kernels, and interfacing with NumPy, making it a preferred choice in scientific computing, data analysis, and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Bend

    Bend

    A massively parallel, high-level programming language

    Bend is an interactive programming environment (REPL) built on top of the Kotlin language, designed to allow users to explore, experiment, and learn Kotlin in a live, feedback-driven manner. The tool lets you define variables, functions, or values at the prompt and iteratively refine them—immediately seeing output and types—while preserving state across commands. It emphasizes discoverability and experimentation: users can inspect functions, call them on sample inputs, and evolve logic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    oneDNN

    oneDNN

    oneAPI Deep Neural Network Library (oneDNN)

    This software was previously known as Intel(R) Math Kernel Library for Deep Neural Networks (Intel(R) MKL-DNN) and Deep Neural Network Library (DNNL). oneAPI Deep Neural Network Library (oneDNN) is an open-source cross-platform performance library of basic building blocks for deep learning applications. oneDNN is part of oneAPI. The library is optimized for Intel(R) Architecture Processors, Intel Processor Graphics and Xe Architecture graphics. oneDNN has experimental support for the following architectures: Arm* 64-bit Architecture (AArch64), NVIDIA* GPU, OpenPOWER* Power ISA (PPC64), IBMz* (s390x), and RISC-V. oneDNN is intended for deep learning applications and framework developers interested in improving application performance on Intel CPUs and GPUs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    VK-GL-CTS

    VK-GL-CTS

    Khronos Vulkan, OpenGL, and OpenGL ES Conformance Tests

    ...These tests are essential for vendors seeking certification, as they rigorously check the correctness and completeness of driver implementations against standardized behavior. The suite contains thousands of automated tests that assess rendering accuracy, API behavior, memory usage, and performance consistency. It is widely used by GPU vendors and developers to ensure compatibility, stability, and reliability across platforms and hardware.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    bitnet.cpp

    bitnet.cpp

    Official inference framework for 1-bit LLMs

    bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Isaac ROS Visual SLAM

    Isaac ROS Visual SLAM

    Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM

    Discover a faster, easier way to build advanced AI robotics applications with the NVIDIA Isaac™ ROS collection of accelerated computing packages and AI models, bringing NVIDIA acceleration to ROS developers everywhere. Isaac ROS Visual SLAM provides a high-performance, best-in-class ROS 2 package for VSLAM (visual simultaneous localization and mapping). This package uses one or more stereo cameras and optionally an IMU to estimate odometry as an input to navigation. It is GPU-accelerated to provide real-time, low-latency results in a robotics application. VSLAM provides an additional odometry source for mobile robots (ground-based) and can be the primary odometry source for drones. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    Recursive Language Models

    Recursive Language Models

    General plug-and-play inference library for Recursive Language Models

    RLM (short for Reinforcement Learning Models) is a modular framework that makes it easier to build, train, evaluate, and deploy reinforcement learning (RL) agents across a wide range of environments and tasks. It provides a consistent API that abstracts away many of the repetitive engineering patterns in RL research and application work, letting developers focus on modeling, experimentation, and fine-tuning rather than infrastructure plumbing. Within the framework, you can define custom...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    BentoML

    BentoML

    Unified Model Serving Framework

    ...Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Numba

    Numba

    NumPy aware dynamic Python compiler using LLVM

    Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your...
    Downloads: 0 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB