Search Results for "gpu max performance" - Page 3

Showing 458 open source projects for "gpu max performance"

View related business solutions
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    lru-cache

    lru-cache

    A fast cache that automatically deletes the least recently used items

    ...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 2
    OpenFang

    OpenFang

    Open-source Agent Operating System

    OpenFang is an open-source agent operating system designed to orchestrate autonomous AI agents and workflows in a structured, production-oriented environment. Written primarily in Rust, the project focuses on building a high-performance runtime where multiple specialized agents can collaborate to complete complex computational or development tasks. It aims to move beyond simple chat-based agents by providing infrastructure for persistent agent memory, task coordination, and scalable execution. The system is positioned as a foundation for building advanced AI tooling, particularly in environments that require tight integration with GPU workflows and modern AI pipelines. ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 3
    webgl-plot

    webgl-plot

    A high-Performance real-time 2D plotting library based on native WebGL

    ...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 4
    Codon

    Codon

    A high-performance, zero-overhead, extensible Python compiler

    Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...
    Downloads: 12 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Zed

    Zed

    High-performance, multiplayer code editor from the creators of Atom

    Zed is a next-generation code editor designed for high-performance collaboration with humans and AI. Written from scratch in Rust to efficiently leverage multiple CPU cores and your GPU. Integrate upcoming LLMs into your workflow to generate, transform, and analyze code. Chat with teammates, write notes together, and share your screen and project. Multibuffers compose excerpts from across the codebase in one editable surface.
    Downloads: 41 This Week
    Last Update:
    See Project
  • 6
    RTP-LLM

    RTP-LLM

    Alibaba's high-performance LLM inference engine for diverse apps

    RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Modular Platform

    Modular Platform

    The Modular Platform (includes MAX & Mojo)

    Modular is a high-performance AI infrastructure company repository focused on building next-generation compute and software tools for machine learning workloads. The project centers on enabling developers to run AI models faster and more efficiently by rethinking the traditional ML software stack. It is closely associated with the Mojo programming language and related tooling that aims to combine Python usability with systems-level performance. Modular’s ecosystem is designed to simplify...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Anime4KCPP

    Anime4KCPP

    A high performance anime upscaler

    Anime4KCPP provides an optimized bloc97's Anime4K algorithm version 0.9, and it also provides its own CNN algorithm ACNet, it provides a variety of way to use, including preprocessing and real-time playback, it aims to be a high-performance tool to process both image and video. This project is for learning and the exploration task of the algorithm course in SWJTU. Anime4K is a simple high-quality anime upscale algorithm. Version 0.9 does not use any machine learning approaches and can be...
    Downloads: 26 This Week
    Last Update:
    See Project
  • 9
    Shumai

    Shumai

    Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

    Shumai is an experimental differentiable tensor library for TypeScript and JavaScript, developed by Facebook Research. It provides a high-performance framework for numerical computing and machine learning within modern JavaScript runtimes. Built on Bun and Flashlight, with ArrayFire as its numerical backend, Shumai brings GPU-accelerated tensor operations, automatic differentiation, and scientific computing tools directly to JavaScript developers. It allows seamless integration of machine learning, deep learning, and custom differentiable programs into web-based or server-side environments without relying on Python frameworks. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 10
    HeavyDB

    HeavyDB

    HeavyDB (formerly MapD/OmniSciDB)

    HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    FLUX.2-klein-4B

    FLUX.2-klein-4B

    Flux 2 image generation model pure C inference

    ...Because the implementation is in plain C and focuses on data locality and vectorized operations, flux2.c can be integrated into performance-critical code paths where control over memory layout and execution behavior matters, such as GPU kernels, embedded systems, or custom ML runtime engines.
    Downloads: 11 This Week
    Last Update:
    See Project
  • 12
    ffmpeg-over-ip

    ffmpeg-over-ip

    Connect to remote ffmpeg servers

    ffmpeg-over-ip is a client-server system that enables remote execution of FFmpeg commands on a machine with GPU access while controlling it from another environment such as a container or virtual machine. It allows applications without direct GPU access to offload video transcoding tasks to a remote server, improving performance without requiring complex passthrough setups. The system works by coordinating commands through a lightweight protocol while using a shared filesystem to exchange media data. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    libplacebo

    libplacebo

    Official mirror of libplacebo

    libplacebo is a flexible, high-performance graphics library built on top of Vulkan, designed to provide reusable GPU-accelerated components for media applications. It originated as a core part of the rendering pipeline for the mpv media player and has since grown into a standalone library used for tone mapping, dithering, color space conversion, and more. libplacebo is ideal for developers looking to integrate sophisticated video rendering and post-processing into their own applications with full control over shaders and rendering stages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    UCCL

    UCCL

    UCCL is an efficient communication library for GPUs

    UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    KVCache-Factory

    KVCache-Factory

    Unified KV Cache Compression Methods for Auto-Regressive Models

    ...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 17
    XFrames

    XFrames

    GPU-accelerated GUI development for Node.js and the browser

    xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    Scalene

    Scalene

    High-performance CPU, GPU, and memory profiler for Python

    Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. Once Scalene has profiled your program, it will launch a web browser with an interactive user interface (all processing is done locally).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    LibreHardwareMonitor

    LibreHardwareMonitor

    Monitor temperature sensors, fan speed, voltage, load & clock speeds

    Libre Hardware Monitor is a free, open-source system monitoring tool that provides detailed insights into your computer’s hardware health and performance. It tracks real-time metrics such as temperatures, fan speeds, voltages, clock speeds, and load across a wide range of components. The project includes both a Windows Forms application for visual monitoring and a reusable library for developers who want to integrate hardware monitoring into their own software. LibreHardwareMonitor supports modern Intel and AMD CPUs, major GPU vendors, storage devices, and network adapters. ...
    Downloads: 258 This Week
    Last Update:
    See Project
  • 20
    TensorRT Node for ComfyUI

    TensorRT Node for ComfyUI

    Enables the best performance on NVIDIA RTX Graphics Cards

    ...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21

    DXVK

    Vulkan-based implementation of D3D9, D3D10 and D3D11 for Linux / Wine

    ...Direct3D is a graphics application programming interface built for Windows and is used for rendering three-dimensional graphics in applications. It is typically useful in applications where performance is vital, such as in three-dimensional games. This project aims to provide support for Direct3D11, feature level 11_1, and Direct3D10, feature level 10_1. Currently however, there are still a few unsupported features, such as shared resources, predication, class linkage and target-independent rasterization. To get the best results out of this project, it is recommended that you use an esync-enabled Wine build to reduce CPU overhead in some games, and to disable desktop effects on your compositor, as this can cause stuttering issues when games are GPU-bound.
    Downloads: 393 This Week
    Last Update:
    See Project
  • 22
    EvoTrees.jl

    EvoTrees.jl

    Boosted trees in Julia

    A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram-based algorithms with support for multiple loss functions, including various regressions, multi-classification and Gaussian max likelihood.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    OpenVINO AI Plugins for Audacity

    OpenVINO AI Plugins for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity

    A set of AI-enabled effects, generators, and analyzers for Audacity. These AI features run 100% locally on your PC, no internet connection is necessary. OpenVINO™ is used to run AI models on supported accelerators found on the user's system such as CPU, GPU, and NPU.
    Downloads: 111 This Week
    Last Update:
    See Project
  • 24
    NVIDIA AI Cluster Runtime (AICR)

    NVIDIA AI Cluster Runtime (AICR)

    Tooling for optimized and reproducible GPU-accelerated AI runtime

    ...Based on its positioning within NVIDIA’s repositories, it is designed to support scalable AI runtime environments, potentially addressing challenges related to orchestration, resource management, or reproducible AI execution. The project likely aligns with NVIDIA’s broader strategy of building modular infrastructure layers that integrate with GPU-accelerated workloads and cloud-native systems. It appears to emphasize automation, consistency, and performance optimization across AI pipelines, potentially targeting enterprise and research use cases. Given NVIDIA’s ecosystem, it may also integrate with containerized environments, Kubernetes, or other orchestration frameworks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 25
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 28 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB