Search Results for "gpu max performance" - Page 2

Sort By:

Showing 457 open source projects for "gpu max performance"

View related business solutions

Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
1

Xenia Canary

Xbox 360 Emulator Research Project

Xenia Canary is an experimental fork of the Xenia Xbox 360 emulator that moves faster than the mainline project to trial bleeding-edge improvements. It focuses on game compatibility and performance by iterating quickly on GPU and CPU emulation paths, shader translation, and timing correctness. Canary builds are where risky optimizations, new backends, and rewrites land first so they can be tested by a wider community before stabilizing. The project emphasizes pragmatism: make more titles boot and run with fewer glitches, even if it means carrying experiments that later get refined or rolled back. ...

Downloads: 104 This Week

Last Update: 2 days ago
See Project
2

Beta9

Run serverless GPU workloads with fast cold starts on bare-metal

beta9 is a platform that enables running serverless GPU workloads with fast cold starts on bare-metal servers globally. It allows developers to deploy and scale GPU-accelerated applications without managing underlying infrastructure, offering flexibility and efficiency for AI and high-performance computing tasks. beta9 supports various frameworks and provides tools for monitoring and managing deployments effectively.

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
3

FlashAttention

Fast and memory-efficient exact attention

FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations.

Downloads: 70 This Week

Last Update: 2026-03-18
See Project
4

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. ...

Downloads: 0 This Week

Last Update: 2026-03-04
See Project
Stop Storing Third-Party Tokens in Your Database
Auth0 Token Vault handles secure token storage, exchange, and refresh for external providers so you don't have to build it yourself.

Rolling your own OAuth token storage can be a security liability. Token Vault securely stores access and refresh tokens from federated providers and handles exchange and renewal automatically. Connected accounts, refresh exchange, and privileged worker flows included.

Try Auth0 for Free
5

NVIDIA Warp

A Python framework for accelerated simulation, data generation

NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. ...

Downloads: 2 This Week

Last Update: 4 days ago
See Project
6

Numba CUDA Target

The CUDA target for Numba

Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance.

Downloads: 2 This Week

Last Update: 2026-04-30
See Project
7

Starling Framework

2D GPU-accelerated framework for ActionScript developers

Starling is an open-source 2D framework for ActionScript developers that leverages GPU acceleration via Adobe's Stage3D API to create smooth, high-performance games and applications across desktop and mobile platforms. It mimics the traditional Flash display list while dramatically improving performance, making it a popular choice for Flash developers transitioning into more efficient, hardware-accelerated environments.

Downloads: 0 This Week

Last Update: 2026-01-02
See Project
8

CUDA Python

Performance meets Productivity

CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools. The project is designed to simplify GPU programming by offering Pythonic abstractions while still exposing the full power of...

Downloads: 2 This Week

Last Update: 2026-04-27
See Project
9

ChefKiss Inferno

Emulating Apple Silicon devices

Inferno by ChefKissInc is a low-level systems project focused on enabling hardware acceleration and advanced graphics compatibility on Apple Silicon devices, particularly within unsupported or experimental environments. It is designed to bridge gaps between macOS hardware capabilities and software ecosystems that traditionally rely on different GPU architectures, such as those found in Linux or Windows environments. The project typically operates at the intersection of kernel extensions, GPU drivers, and virtualization layers, aiming to unlock performance features that are otherwise restricted or unavailable. Inferno is especially relevant for developers working on emulation, virtualization, or cross-platform graphics stacks, as it attempts to expose native GPU functionality in unconventional contexts. ...

Downloads: 4 This Week

Last Update: 2026-04-29
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

Flash-MoE

Running a big model on a small laptop

...It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
11

OptiScaler

OptiScaler bridges upscaling/frame gen across GPUs

...The tool effectively acts as a compatibility layer between the game engine and multiple upscaling frameworks, enabling cross-GPU access to features that might otherwise be restricted to specific hardware ecosystems. In addition to replacing upscalers, OptiScaler can enable frame generation features in titles that do not officially support them, improving frame rates and perceived smoothness during gameplay.

Downloads: 219 This Week

Last Update: 2026-04-27
See Project
12

uzu

A high-performance inference engine for AI models

...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.

Downloads: 1 This Week

Last Update: 1 day ago
See Project
13

Butterchurn

Butterchurn is a WebGL implementation of the Milkdrop Visualizer

...The project emphasizes both artistic expression and technical performance, offering a balance between visual complexity and efficiency.

Downloads: 5 This Week

Last Update: 2026-04-20
See Project
14

autoresearch-macos

AI agents running research on single-GPU nanochat training

autoresearch-macos is a macOS-focused adaptation of autonomous research loop systems inspired by the autoresearch paradigm, enabling AI agents to iteratively improve machine learning models through self-directed experimentation. The system follows a structured loop in which an agent modifies a training script, executes a fixed-duration experiment, evaluates performance metrics, and decides whether to keep or revert changes. It is designed to operate efficiently within macOS environments, making it accessible for developers working outside traditional high-performance GPU clusters. The project typically includes components such as data preparation scripts, a training loop, and an instruction file that guides the agent’s behavior. ...

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
15

Newton

An open-source, GPU-accelerated physics simulation engine

Newton is a high-performance, GPU-accelerated physics simulation engine designed primarily for robotics research, machine learning, and advanced simulation workflows. Built on top of NVIDIA Warp, it leverages GPU parallelism to deliver scalable and efficient simulation environments that support rapid iteration and experimentation. The engine extends previous simulation frameworks by introducing differentiable physics capabilities, allowing it to integrate seamlessly with machine learning models and optimization pipelines. ...

Downloads: 0 This Week

Last Update: 2026-04-13
See Project
16

NVIDIA cuOpt

GPU accelerated decision optimization

...The platform provides multiple interfaces, including C, Python, and server APIs, allowing developers to integrate optimization capabilities into applications and services. cuOpt is designed for high-performance environments and can be deployed across cloud, hybrid, or on-premise infrastructures. By combining GPU acceleration with scalable APIs, cuOpt enables organizations to solve large optimization challenges in logistics, operations research, and decision-making systems.

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
17

Flux.jl

Relax! Flux is the ML library that doesn't make you tensor

Flux is an elegant approach to machine learning. It's a 100% pure Julia stack and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable. Flux provides a single, intuitive way to define models, just like mathematical notation. Julia transparently compiles your code, optimizing and fusing kernels for the GPU, for the best performance. Existing Julia libraries are differentiable and can be incorporated directly into Flux models. ...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
18

Triton

Development repository for the Triton language and compiler

...The project leverages LLVM and MLIR to compile code into efficient GPU instructions, supporting both NVIDIA and AMD hardware. It is widely used in research and production environments where custom tensor operations are required, offering both high performance and developer-friendly syntax.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
19

luma.gl

High-performance Toolkit for WebGL-based data visualization

luma.gl is a GPU toolkit for the Web-focused primarily on data visualization use cases. luma.gl aims to provide support for GPU programmers that need to work directly with shaders and want a low abstraction API that remains conceptually close to the WebGPU and WebGL APIs. Unlike other common WebGL APIs, the developer can choose to use the parts of luma.gl that support their use case and leave the others behind. While generic enough to be used for general 3D rendering, luma.gl's mandate is...

Downloads: 0 This Week

Last Update: 2026-04-21
See Project
20

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...

Downloads: 41 This Week

Last Update: 2026-04-23
See Project
21

Meridian

Meridian is an MMM framework

...Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.

Downloads: 10 This Week

Last Update: 3 hours ago
See Project
22

GPUStack

Performance-optimized AI inference on your GPUs

GPUStack is an open-source GPU cluster management platform designed to simplify the deployment and operation of artificial intelligence models across heterogeneous hardware environments. The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure. Instead of requiring complex orchestration systems such as Kubernetes, GPUStack provides a...

Downloads: 1 This Week

Last Update: 2026-04-21
See Project
23

lru-cache

A fast cache that automatically deletes the least recently used items

...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
24

OpenFang

Open-source Agent Operating System

OpenFang is an open-source agent operating system designed to orchestrate autonomous AI agents and workflows in a structured, production-oriented environment. Written primarily in Rust, the project focuses on building a high-performance runtime where multiple specialized agents can collaborate to complete complex computational or development tasks. It aims to move beyond simple chat-based agents by providing infrastructure for persistent agent memory, task coordination, and scalable execution. The system is positioned as a foundation for building advanced AI tooling, particularly in environments that require tight integration with GPU workflows and modern AI pipelines. ...

Downloads: 8 This Week

Last Update: 6 days ago
See Project
25

CUDA.jl

CUDA programming in Julia

High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher.

Downloads: 1 This Week

Last Update: 2026-04-30
See Project