gpu max performance free download

Showing 114 open source projects for "gpu max performance"

View related business solutions

Software Development Clear Filters & Widen Search

AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

nviwatch

A blazingly fast rust based TUI for managing and monitoring NVIDIA GPU

NviWatch is an interactive terminal user interface (TUI) application for monitoring NVIDIA GPU devices and processes. Built with Rust, it provides real-time insights into GPU performance metrics, including temperature, utilization, memory usage, and power consumption.

Downloads: 0 This Week

Last Update: 2025-08-21
See Project
2

CubeCL

Multi-platform high-performance compute language extension for Rust

CubeCL is a low-level compute language and compiler framework designed to simplify and optimize GPU programming for high-performance workloads, particularly in machine learning and numerical computing. It provides an abstraction layer that allows developers to write portable, hardware-efficient compute kernels without directly dealing with complex GPU APIs such as CUDA or OpenCL. CubeCL focuses on delivering predictable performance and composability by exposing explicit control over memory layouts, parallelism, and execution patterns while still maintaining a developer-friendly syntax. ...

Downloads: 0 This Week

Last Update: 2026-03-18
See Project
3

CUDA Core Compute Libraries (CCCL)

CUDA Core Compute Libraries

...By unifying these components, CCCL reduces duplication and improves developer productivity while maintaining performance across different GPU architectures.

Downloads: 1 This Week

Last Update: 2 days ago
See Project
4

Numba CUDA Target

The CUDA target for Numba

Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance.

Downloads: 2 This Week

Last Update: 7 days ago
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

Starling Framework

2D GPU-accelerated framework for ActionScript developers

Starling is an open-source 2D framework for ActionScript developers that leverages GPU acceleration via Adobe's Stage3D API to create smooth, high-performance games and applications across desktop and mobile platforms. It mimics the traditional Flash display list while dramatically improving performance, making it a popular choice for Flash developers transitioning into more efficient, hardware-accelerated environments.

Downloads: 0 This Week

Last Update: 2026-01-02
See Project
6

CUDA Python

Performance meets Productivity

CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools. The project is designed to simplify GPU programming by offering Pythonic abstractions while still exposing the full power of...

Downloads: 2 This Week

Last Update: 2026-04-27
See Project
7

NVIDIA Warp

A Python framework for accelerated simulation, data generation

NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
8

Butterchurn

Butterchurn is a WebGL implementation of the Milkdrop Visualizer

...The project emphasizes both artistic expression and technical performance, offering a balance between visual complexity and efficiency.

Downloads: 5 This Week

Last Update: 2026-04-20
See Project
9

Triton

Development repository for the Triton language and compiler

...The project leverages LLVM and MLIR to compile code into efficient GPU instructions, supporting both NVIDIA and AMD hardware. It is widely used in research and production environments where custom tensor operations are required, offering both high performance and developer-friendly syntax.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Meridian

Meridian is an MMM framework

...Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.

Downloads: 9 This Week

Last Update: 4 hours ago
See Project
11

webgl-plot

A high-Performance real-time 2D plotting library based on native WebGL

...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.

Downloads: 1 This Week

Last Update: 2025-03-26
See Project
12

Codon

A high-performance, zero-overhead, extensible Python compiler

Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...

Downloads: 12 This Week

Last Update: 2026-03-04
See Project
13

lru-cache

A fast cache that automatically deletes the least recently used items

...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.

Downloads: 2 This Week

Last Update: 2 days ago
See Project
14

libplacebo

Official mirror of libplacebo

libplacebo is a flexible, high-performance graphics library built on top of Vulkan, designed to provide reusable GPU-accelerated components for media applications. It originated as a core part of the rendering pipeline for the mpv media player and has since grown into a standalone library used for tone mapping, dithering, color space conversion, and more. libplacebo is ideal for developers looking to integrate sophisticated video rendering and post-processing into their own applications with full control over shaders and rendering stages.

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
15

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.

Downloads: 3 This Week

Last Update: 2025-10-30
See Project
16

Zed

High-performance, multiplayer code editor from the creators of Atom

Zed is a next-generation code editor designed for high-performance collaboration with humans and AI. Written from scratch in Rust to efficiently leverage multiple CPU cores and your GPU. Integrate upcoming LLMs into your workflow to generate, transform, and analyze code. Chat with teammates, write notes together, and share your screen and project. Multibuffers compose excerpts from across the codebase in one editable surface.

Downloads: 27 This Week

Last Update: 9 hours ago
See Project
17

XFrames

GPU-accelerated GUI development for Node.js and the browser

xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.

Downloads: 0 This Week

Last Update: 2024-12-07
See Project
18

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

Shumai is an experimental differentiable tensor library for TypeScript and JavaScript, developed by Facebook Research. It provides a high-performance framework for numerical computing and machine learning within modern JavaScript runtimes. Built on Bun and Flashlight, with ArrayFire as its numerical backend, Shumai brings GPU-accelerated tensor operations, automatic differentiation, and scientific computing tools directly to JavaScript developers. It allows seamless integration of machine learning, deep learning, and custom differentiable programs into web-based or server-side environments without relying on Python frameworks. ...

Downloads: 0 This Week

Last Update: 4 days ago
See Project
19

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...

Downloads: 13 This Week

Last Update: 2026-03-25
See Project
20

UIforETW

User interface for recording and managing ETW traces

UIforETW is a Windows performance tracing companion that wraps the Event Tracing for Windows (ETW) toolchain in an approachable GUI. It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line incantations. ...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
21

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style.

Downloads: 0 This Week

Last Update: 2026-04-29
See Project
22

three-d

Makes it simple to draw stuff across platforms (including web)

three-d is a lightweight and modern 3D rendering library written in Rust that targets both native and WebAssembly environments, providing a simple yet powerful abstraction over GPU-based graphics APIs. It is designed to make 3D graphics programming accessible while still offering fine-grained control over rendering pipelines, materials, lighting, and camera systems. The library leverages modern graphics standards such as OpenGL and WebGL to deliver high-performance rendering across platforms, including browsers and desktop applications. ...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
23

Zoo Design Studio

The Zoo Design Studio app

...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.

Downloads: 8 This Week

Last Update: 10 hours ago
See Project
24

DALI

A GPU-accelerated library containing highly optimized building blocks

...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.

Downloads: 1 This Week

Last Update: 2026-04-16
See Project
25

Nuclio

High-Performance Serverless event and data processing platform

Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...

Downloads: 2 This Week

Last Update: 2026-04-16
See Project