Search Results for "gpu max performance" - Page 3

Sort By:

Showing 458 open source projects for "gpu max performance"

View related business solutions

Auth0 B2B Essentials: SSO, MFA, and RBAC Built In
Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.

Sign Up Free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

lru-cache

A fast cache that automatically deletes the least recently used items

...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.

Downloads: 2 This Week

Last Update: 5 days ago
See Project
2

OpenFang

Open-source Agent Operating System

OpenFang is an open-source agent operating system designed to orchestrate autonomous AI agents and workflows in a structured, production-oriented environment. Written primarily in Rust, the project focuses on building a high-performance runtime where multiple specialized agents can collaborate to complete complex computational or development tasks. It aims to move beyond simple chat-based agents by providing infrastructure for persistent agent memory, task coordination, and scalable execution. The system is positioned as a foundation for building advanced AI tooling, particularly in environments that require tight integration with GPU workflows and modern AI pipelines. ...

Downloads: 8 This Week

Last Update: 2026-05-01
See Project
3

webgl-plot

A high-Performance real-time 2D plotting library based on native WebGL

...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.

Downloads: 1 This Week

Last Update: 2025-03-26
See Project
4

Codon

A high-performance, zero-overhead, extensible Python compiler

Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...

Downloads: 12 This Week

Last Update: 2026-03-04
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Zed

High-performance, multiplayer code editor from the creators of Atom

Zed is a next-generation code editor designed for high-performance collaboration with humans and AI. Written from scratch in Rust to efficiently leverage multiple CPU cores and your GPU. Integrate upcoming LLMs into your workflow to generate, transform, and analyze code. Chat with teammates, write notes together, and share your screen and project. Multibuffers compose excerpts from across the codebase in one editable surface.

Downloads: 41 This Week

Last Update: 9 hours ago
See Project
6

RTP-LLM

Alibaba's high-performance LLM inference engine for diverse apps

RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads.

Downloads: 2 This Week

Last Update: 2026-03-09
See Project
7

Modular Platform

The Modular Platform (includes MAX & Mojo)

Modular is a high-performance AI infrastructure company repository focused on building next-generation compute and software tools for machine learning workloads. The project centers on enabling developers to run AI models faster and more efficiently by rethinking the traditional ML software stack. It is closely associated with the Mojo programming language and related tooling that aims to combine Python usability with systems-level performance. Modular’s ecosystem is designed to simplify...

Downloads: 0 This Week

Last Update: 1 day ago
See Project
8

Anime4KCPP

A high performance anime upscaler

Anime4KCPP provides an optimized bloc97's Anime4K algorithm version 0.9, and it also provides its own CNN algorithm ACNet, it provides a variety of way to use, including preprocessing and real-time playback, it aims to be a high-performance tool to process both image and video. This project is for learning and the exploration task of the algorithm course in SWJTU. Anime4K is a simple high-quality anime upscale algorithm. Version 0.9 does not use any machine learning approaches and can be...

Downloads: 26 This Week

Last Update: 2025-08-01
See Project
9

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

Shumai is an experimental differentiable tensor library for TypeScript and JavaScript, developed by Facebook Research. It provides a high-performance framework for numerical computing and machine learning within modern JavaScript runtimes. Built on Bun and Flashlight, with ArrayFire as its numerical backend, Shumai brings GPU-accelerated tensor operations, automatic differentiation, and scientific computing tools directly to JavaScript developers. It allows seamless integration of machine learning, deep learning, and custom differentiable programs into web-based or server-side environments without relying on Python frameworks. ...

Downloads: 2 This Week

Last Update: 7 days ago
See Project
Build Securely on AWS with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
11

FLUX.2-klein-4B

Flux 2 image generation model pure C inference

...Because the implementation is in plain C and focuses on data locality and vectorized operations, flux2.c can be integrated into performance-critical code paths where control over memory layout and execution behavior matters, such as GPU kernels, embedded systems, or custom ML runtime engines.

Downloads: 11 This Week

Last Update: 2026-02-13
See Project
12

ffmpeg-over-ip

Connect to remote ffmpeg servers

ffmpeg-over-ip is a client-server system that enables remote execution of FFmpeg commands on a machine with GPU access while controlling it from another environment such as a container or virtual machine. It allows applications without direct GPU access to offload video transcoding tasks to a remote server, improving performance without requiring complex passthrough setups. The system works by coordinating commands through a lightweight protocol while using a shared filesystem to exchange media data. ...

Downloads: 4 This Week

Last Update: 5 days ago
See Project
13

libplacebo

Official mirror of libplacebo

libplacebo is a flexible, high-performance graphics library built on top of Vulkan, designed to provide reusable GPU-accelerated components for media applications. It originated as a core part of the rendering pipeline for the mpv media player and has since grown into a standalone library used for tone mapping, dithering, color space conversion, and more. libplacebo is ideal for developers looking to integrate sophisticated video rendering and post-processing into their own applications with full control over shaders and rendering stages.

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
14

UCCL

UCCL is an efficient communication library for GPUs

UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. ...

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
15

KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
16

Text Generation Inference

Large Language Model Text Generation Inference

Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.

Downloads: 1 This Week

Last Update: 2025-12-18
See Project
17

XFrames

GPU-accelerated GUI development for Node.js and the browser

xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.

Downloads: 1 This Week

Last Update: 2024-12-07
See Project
18

Scalene

High-performance CPU, GPU, and memory profiler for Python

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. Once Scalene has profiled your program, it will launch a web browser with an interactive user interface (all processing is done locally).

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
19

LibreHardwareMonitor

Monitor temperature sensors, fan speed, voltage, load & clock speeds

Libre Hardware Monitor is a free, open-source system monitoring tool that provides detailed insights into your computer’s hardware health and performance. It tracks real-time metrics such as temperatures, fan speeds, voltages, clock speeds, and load across a wide range of components. The project includes both a Windows Forms application for visual monitoring and a reusable library for developers who want to integrate hardware monitoring into their own software. LibreHardwareMonitor supports modern Intel and AMD CPUs, major GPU vendors, storage devices, and network adapters. ...

Downloads: 258 This Week

Last Update: 2026-02-14
See Project
20

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.

Downloads: 2 This Week

Last Update: 2025-10-30
See Project
21

DXVK

Vulkan-based implementation of D3D9, D3D10 and D3D11 for Linux / Wine

...Direct3D is a graphics application programming interface built for Windows and is used for rendering three-dimensional graphics in applications. It is typically useful in applications where performance is vital, such as in three-dimensional games. This project aims to provide support for Direct3D11, feature level 11_1, and Direct3D10, feature level 10_1. Currently however, there are still a few unsupported features, such as shared resources, predication, class linkage and target-independent rasterization. To get the best results out of this project, it is recommended that you use an esync-enabled Wine build to reduce CPU overhead in some games, and to disable desktop effects on your compositor, as this can cause stuttering issues when games are GPU-bound.

Downloads: 393 This Week

Last Update: 2025-10-11
See Project
22

EvoTrees.jl

Boosted trees in Julia

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram-based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.

Downloads: 0 This Week

Last Update: 2026-02-24
See Project
23

OpenVINO AI Plugins for Audacity

A set of AI-enabled effects, generators, and analyzers for Audacity

A set of AI-enabled effects, generators, and analyzers for Audacity. These AI features run 100% locally on your PC, no internet connection is necessary. OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.

Downloads: 111 This Week

Last Update: 2024-12-20
See Project
24

NVIDIA AI Cluster Runtime (AICR)

Tooling for optimized and reproducible GPU-accelerated AI runtime

...Based on its positioning within NVIDIA’s repositories, it is designed to support scalable AI runtime environments, potentially addressing challenges related to orchestration, resource management, or reproducible AI execution. The project likely aligns with NVIDIA’s broader strategy of building modular infrastructure layers that integrate with GPU-accelerated workloads and cloud-native systems. It appears to emphasize automation, consistency, and performance optimization across AI pipelines, potentially targeting enterprise and research use cases. Given NVIDIA’s ecosystem, it may also integrate with containerized environments, Kubernetes, or other orchestration frameworks.

Downloads: 2 This Week

Last Update: 2026-05-01
See Project
25

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...

Downloads: 28 This Week

Last Update: 2026-04-06
See Project