Search Results for "gpu max performance" - Page 3

Showing 388 open source projects for "gpu max performance"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 1
    LMCache

    LMCache

    Supercharge Your LLM with the Fastest KV Cache Layer

    ...These capabilities aim to lower latency, cut GPU cycles, and stabilize performance for production workloads with overlapping prompts or retrieval-augmented contexts. The end result is a cache fabric for LLMs that complements engines rather than replacing them.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Zed

    Zed

    High-performance, multiplayer code editor from the creators of Atom

    Zed is a next-generation code editor designed for high-performance collaboration with humans and AI. Written from scratch in Rust to efficiently leverage multiple CPU cores and your GPU. Integrate upcoming LLMs into your workflow to generate, transform, and analyze code. Chat with teammates, write notes together, and share your screen and project. Multibuffers compose excerpts from across the codebase in one editable surface.
    Downloads: 27 This Week
    Last Update:
    See Project
  • 3
    Pruna AI

    Pruna AI

    Pruna is a model optimization framework built for developers

    Pruna is an open-source, self-hostable AI inference engine designed to help teams deploy and manage large language models (LLMs) efficiently across private or hybrid infrastructures. Built with performance and developer ergonomics in mind, Pruna simplifies inference workflows by enabling multi-model orchestration, autoscaling, GPU resource allocation, and compatibility with popular open-source models. It is ideal for companies or teams looking to reduce reliance on external APIs while maintaining speed, cost-efficiency, and full control over their data and AI stack. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    libplacebo

    libplacebo

    Official mirror of libplacebo

    libplacebo is a flexible, high-performance graphics library built on top of Vulkan, designed to provide reusable GPU-accelerated components for media applications. It originated as a core part of the rendering pipeline for the mpv media player and has since grown into a standalone library used for tone mapping, dithering, color space conversion, and more. libplacebo is ideal for developers looking to integrate sophisticated video rendering and post-processing into their own applications with full control over shaders and rendering stages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Application Monitoring That Won't Slow Your App Down Icon
    Application Monitoring That Won't Slow Your App Down

    AppSignal's Rust-based agent is lightweight and stable. Already running in thousands of production apps.

    Full APM with errors, performance, logs, and uptime monitoring. 99.999% uptime SLA on the platform itself.
    Start Free
  • 5
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 6
    UCCL

    UCCL

    UCCL is an efficient communication library for GPUs

    UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    ...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    FLUX.2-klein-4B

    FLUX.2-klein-4B

    Flux 2 image generation model pure C inference

    ...Because the implementation is in plain C and focuses on data locality and vectorized operations, flux2.c can be integrated into performance-critical code paths where control over memory layout and execution behavior matters, such as GPU kernels, embedded systems, or custom ML runtime engines.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 9
    Scalene

    Scalene

    High-performance CPU, GPU, and memory profiler for Python

    Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. Once Scalene has profiled your program, it will launch a web browser with an interactive user interface (all processing is done locally).
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • 10
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 11
    Anime4KCPP

    Anime4KCPP

    A high performance anime upscaler

    Anime4KCPP provides an optimized bloc97's Anime4K algorithm version 0.9, and it also provides its own CNN algorithm ACNet, it provides a variety of way to use, including preprocessing and real-time playback, it aims to be a high-performance tool to process both image and video. This project is for learning and the exploration task of the algorithm course in SWJTU. Anime4K is a simple high-quality anime upscale algorithm. Version 0.9 does not use any machine learning approaches and can be...
    Downloads: 18 This Week
    Last Update:
    See Project
  • 12
    EvoTrees.jl

    EvoTrees.jl

    Boosted trees in Julia

    A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram-based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13

    DXVK

    Vulkan-based implementation of D3D9, D3D10 and D3D11 for Linux / Wine

    ...Direct3D is a graphics application programming interface built for Windows and is used for rendering three-dimensional graphics in applications. It is typically useful in applications where performance is vital, such as in three-dimensional games. This project aims to provide support for Direct3D11, feature level 11_1, and Direct3D10, feature level 10_1. Currently however, there are still a few unsupported features, such as shared resources, predication, class linkage and target-independent rasterization. To get the best results out of this project, it is recommended that you use an esync-enabled Wine build to reduce CPU overhead in some games, and to disable desktop effects on your compositor, as this can cause stuttering issues when games are GPU-bound.
    Downloads: 399 This Week
    Last Update:
    See Project
  • 14
    LibreHardwareMonitor

    LibreHardwareMonitor

    Monitor temperature sensors, fan speed, voltage, load & clock speeds

    Libre Hardware Monitor is a free, open-source system monitoring tool that provides detailed insights into your computer’s hardware health and performance. It tracks real-time metrics such as temperatures, fan speeds, voltages, clock speeds, and load across a wide range of components. The project includes both a Windows Forms application for visual monitoring and a reusable library for developers who want to integrate hardware monitoring into their own software. LibreHardwareMonitor supports modern Intel and AMD CPUs, major GPU vendors, storage devices, and network adapters. ...
    Downloads: 255 This Week
    Last Update:
    See Project
  • 15
    Ultralight

    Ultralight

    Lightweight, high-performance HTML renderer for game developers

    ...Available for desktop apps, game consoles, TVs, embedded device displays, servers, and more. Official API for C and C++, with bindings for more. Render web-content on the GPU via Direct3D, Metal, OpenGL, or your own engine for unmatched visual performance. Render web-content on the CPU via SIMD/parallel for incredibly easy integration with any environment (including server-side!). Ultralight is engineered for peak performance, ensuring minimal CPU and memory usage. Customize low-level platform functionality, integrate JavaScript directly with native code, dive deep into performance tuning, and more. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 16
    OpenVINO AI Plugins for Audacity

    OpenVINO AI Plugins for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity. These AI features run 100% locally on your PC, no internet connection is necessary. OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.
    Downloads: 114 This Week
    Last Update:
    See Project
  • 17
    ffmpeg-over-ip

    ffmpeg-over-ip

    Connect to remote ffmpeg servers

    ffmpeg-over-ip is a client-server system that enables remote execution of FFmpeg commands on a machine with GPU access while controlling it from another environment such as a container or virtual machine. It allows applications without direct GPU access to offload video transcoding tasks to a remote server, improving performance without requiring complex passthrough setups. The system works by coordinating commands through a lightweight protocol while using a shared filesystem to exchange media data. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 33 This Week
    Last Update:
    See Project
  • 19
    Alpamayo 1

    Alpamayo 1

    Bridging Reasoning and Action Prediction

    ...It incorporates vision-language-action modeling, enabling it to process sensor data and contextual information simultaneously. Alpamayo supports tasks such as trajectory prediction, auto-labeling, and reasoning-based decision making. The system is optimized for high-performance GPU environments and is intended primarily for experimentation and benchmarking. Overall, it represents an advanced step toward integrating reasoning into autonomous driving pipelines.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Diligent Core

    Diligent Core

    A modern cross-platform low-level graphics API

    DiligentCore is a low-level, cross-platform rendering library designed to provide a modern graphics abstraction layer over Direct3D11, Direct3D12, OpenGL, Vulkan, and Metal. It’s aimed at developers building high-performance rendering engines and scientific visualization tools. DiligentCore gives precise control over GPU resources and rendering pipelines, while also abstracting away platform-specific boilerplate. The library is modular, extensible, and well-suited for projects that require direct access to modern graphics APIs while maintaining portability and scalability.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 21
    RTP-LLM

    RTP-LLM

    Alibaba's high-performance LLM inference engine for diverse apps

    RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    clip-retrieval

    clip-retrieval

    Easily compute clip embeddings and build a clip retrieval system

    ...It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes components for inference, indexing, filtering, and serving results through APIs, making it a complete pipeline for building production-ready retrieval systems. The framework also supports querying by image, text, or embedding, enabling flexible use cases such as reverse image search or multimodal content discovery. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    XFrames

    XFrames

    GPU-accelerated GUI development for Node.js and the browser

    xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    Shumai

    Shumai

    Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

    Shumai is an experimental differentiable tensor library for TypeScript and JavaScript, developed by Facebook Research. It provides a high-performance framework for numerical computing and machine learning within modern JavaScript runtimes. Built on Bun and Flashlight, with ArrayFire as its numerical backend, Shumai brings GPU-accelerated tensor operations, automatic differentiation, and scientific computing tools directly to JavaScript developers. It allows seamless integration of machine learning, deep learning, and custom differentiable programs into web-based or server-side environments without relying on Python frameworks. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    G-Helper

    G-Helper

    Lightweight Armoury Crate alternative for Asus laptops and ROG Ally

    Small and lightweight Armoury Crate alternative for Asus laptops offering almost same functionality without extra load and unnecessary services. Works with all popular models, such as ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, Flow Z13, DUO, TUF Series, Strix or Scar Series, ProArt, Vivobook, Zenbook, ROG Ally or Ally X and many more.
    Downloads: 150 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB