Search Results for "gpu max performance" - Page 2

Showing 347 open source projects for "gpu max performance"

View related business solutions
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.
    Start Free
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 1
    XenosRecomp

    XenosRecomp

    A tool for converting Xbox 360 shaders to HLSL

    ...The project addresses one of the most complex aspects of console reverse engineering, which is accurately reproducing proprietary GPU behavior in a portable and efficient way. By reconstructing the graphics pipeline, XenosRecomp enables developers to render scenes correctly without relying on emulation layers that can introduce performance overhead or inaccuracies.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    FlashAttention

    FlashAttention

    Fast and memory-efficient exact attention

    FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations.
    Downloads: 78 This Week
    Last Update:
    See Project
  • 3
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    Starling Framework

    Starling Framework

    2D GPU-accelerated framework for ActionScript developers

    Starling is an open-source 2D framework for ActionScript developers that leverages GPU acceleration via Adobe's Stage3D API to create smooth, high-performance games and applications across desktop and mobile platforms. It mimics the traditional Flash display list while dramatically improving performance, making it a popular choice for Flash developers transitioning into more efficient, hardware-accelerated environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    Flash-MoE

    Flash-MoE

    Running a big model on a small laptop

    ...It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    uzu

    uzu

    A high-performance inference engine for AI models

    ...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    Flux.jl

    Flux.jl

    Relax! Flux is the ML library that doesn't make you tensor

    Flux is an elegant approach to machine learning. It's a 100% pure Julia stack and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable. Flux provides a single, intuitive way to define models, just like mathematical notation. Julia transparently compiles your code, optimizing and fusing kernels for the GPU, for the best performance. Existing Julia libraries are differentiable and can be incorporated directly into Flux models. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    luma.gl

    luma.gl

    High-performance Toolkit for WebGL-based data visualization

    luma.gl is a GPU toolkit for the Web-focused primarily on data visualization use cases. luma.gl aims to provide support for GPU programmers that need to work directly with shaders and want a low abstraction API that remains conceptually close to the WebGPU and WebGL APIs. Unlike other common WebGL APIs, the developer can choose to use the parts of luma.gl that support their use case and leave the others behind. While generic enough to be used for general 3D rendering, luma.gl's mandate is...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 9
    OptiScaler

    OptiScaler

    OptiScaler bridges upscaling/frame gen across GPUs

    ...The tool effectively acts as a compatibility layer between the game engine and multiple upscaling frameworks, enabling cross-GPU access to features that might otherwise be restricted to specific hardware ecosystems. In addition to replacing upscalers, OptiScaler can enable frame generation features in titles that do not officially support them, improving frame rates and perceived smoothness during gameplay.
    Downloads: 185 This Week
    Last Update:
    See Project
  • Secure File Transfer for Windows with Cerberus by Redwood Icon
    Secure File Transfer for Windows with Cerberus by Redwood

    Protect and share files over FTP/S, SFTP, HTTPS and SCP with the #1 rated Windows file transfer server.

    Cerberus supports unlimited users and connections on a single IP, with built-in encryption, 2FA, and a browser-based web client — all deployable in under 15 minutes with a 25-day free trial.
    Try for Free
  • 10
    GPUStack

    GPUStack

    Performance-optimized AI inference on your GPUs

    GPUStack is an open-source GPU cluster management platform designed to simplify the deployment and operation of artificial intelligence models across heterogeneous hardware environments. The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure. Instead of requiring complex orchestration systems such as Kubernetes, GPUStack provides a...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 11
    CUDA.jl

    CUDA.jl

    CUDA programming in Julia

    High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 12
    autoresearch-macos

    autoresearch-macos

    AI agents running research on single-GPU nanochat training

    autoresearch-macos is a macOS-focused adaptation of autonomous research loop systems inspired by the autoresearch paradigm, enabling AI agents to iteratively improve machine learning models through self-directed experimentation. The system follows a structured loop in which an agent modifies a training script, executes a fixed-duration experiment, evaluates performance metrics, and decides whether to keep or revert changes. It is designed to operate efficiently within macOS environments, making it accessible for developers working outside traditional high-performance GPU clusters. The project typically includes components such as data preparation scripts, a training loop, and an instruction file that guides the agent’s behavior. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    ChefKiss Inferno

    ChefKiss Inferno

    Emulating Apple Silicon devices

    Inferno by ChefKissInc is a low-level systems project focused on enabling hardware acceleration and advanced graphics compatibility on Apple Silicon devices, particularly within unsupported or experimental environments. It is designed to bridge gaps between macOS hardware capabilities and software ecosystems that traditionally rely on different GPU architectures, such as those found in Linux or Windows environments. The project typically operates at the intersection of kernel extensions, GPU drivers, and virtualization layers, aiming to unlock performance features that are otherwise restricted or unavailable. Inferno is especially relevant for developers working on emulation, virtualization, or cross-platform graphics stacks, as it attempts to expose native GPU functionality in unconventional contexts. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    Newton

    Newton

    An open-source, GPU-accelerated physics simulation engine

    Newton is a high-performance, GPU-accelerated physics simulation engine designed primarily for robotics research, machine learning, and advanced simulation workflows. Built on top of NVIDIA Warp, it leverages GPU parallelism to deliver scalable and efficient simulation environments that support rapid iteration and experimentation. The engine extends previous simulation frameworks by introducing differentiable physics capabilities, allowing it to integrate seamlessly with machine learning models and optimization pipelines. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    NVIDIA cuOpt

    NVIDIA cuOpt

    GPU accelerated decision optimization

    ...The platform provides multiple interfaces, including C, Python, and server APIs, allowing developers to integrate optimization capabilities into applications and services. cuOpt is designed for high-performance environments and can be deployed across cloud, hybrid, or on-premise infrastructures. By combining GPU acceleration with scalable APIs, cuOpt enables organizations to solve large optimization challenges in logistics, operations research, and decision-making systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Butterchurn

    Butterchurn

    Butterchurn is a WebGL implementation of the Milkdrop Visualizer

    ...The project emphasizes both artistic expression and technical performance, offering a balance between visual complexity and efficiency.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 17
    Megatron-LM

    Megatron-LM

    Ongoing research training transformer models at scale

    Megatron-LM is a GPU-optimized deep learning framework from NVIDIA designed to train extremely large transformer-based language models efficiently at scale. The repository provides both a reference training implementation and Megatron Core, a composable library of high-performance building blocks for custom large-model pipelines. It supports advanced parallelism strategies including tensor, pipeline, data, expert, and context parallelism, enabling training across massive multi-GPU and multi-node clusters. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 18
    OpenFang

    OpenFang

    Open-source Agent Operating System

    OpenFang is an open-source agent operating system designed to orchestrate autonomous AI agents and workflows in a structured, production-oriented environment. Written primarily in Rust, the project focuses on building a high-performance runtime where multiple specialized agents can collaborate to complete complex computational or development tasks. It aims to move beyond simple chat-based agents by providing infrastructure for persistent agent memory, task coordination, and scalable execution. The system is positioned as a foundation for building advanced AI tooling, particularly in environments that require tight integration with GPU workflows and modern AI pipelines. ...
    Downloads: 13 This Week
    Last Update:
    See Project
  • 19
    LTX-2

    LTX-2

    Python inference and LoRA trainer package for the LTX-2 audio–video

    LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...
    Downloads: 44 This Week
    Last Update:
    See Project
  • 20
    lru-cache

    lru-cache

    A fast cache that automatically deletes the least recently used items

    ...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    XFrames

    XFrames

    GPU-accelerated GUI development for Node.js and the browser

    xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 23
    Codon

    Codon

    A high-performance, zero-overhead, extensible Python compiler

    Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 24
    Meridian

    Meridian

    Meridian is an MMM framework

    ...Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 25
    webgl-plot

    webgl-plot

    A high-Performance real-time 2D plotting library based on native WebGL

    ...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.
    Downloads: 1 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB