gpu max performance free download

63 projects for "gpu max performance" with 2 filters applied:

Artificial Intelligence ChromeOS Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build, govern, and optimize agents and models with Gemini Enterprise Agent Platform.

Start Free
1

GPU Hot

Real-time NVIDIA GPU dashboard

GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser.

Downloads: 3 This Week

Last Update: 2026-04-11
See Project
2

how-to-optim-algorithm-in-cuda

How to optimize some algorithm in cuda

...These examples show how different optimization techniques influence performance on modern GPU hardware and allow readers to experiment with real implementations. The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
3

llmfit

157 models, 30 providers, one command to find what runs on hardware

llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...

Downloads: 20 This Week

Last Update: 1 day ago
See Project
4

Flash-MoE

Running a big model on a small laptop

...It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.

Downloads: 0 This Week

Last Update: 2026-04-02
See Project
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
5

uzu

A high-performance inference engine for AI models

...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.

Downloads: 1 This Week

Last Update: 15 hours ago
See Project
6

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...

Downloads: 41 This Week

Last Update: 2026-04-23
See Project
7

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
8

UCCL

UCCL is an efficient communication library for GPUs

UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. ...

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
9

KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
Build Securely on Azure with Proven Frameworks
Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now
10

RTP-LLM

Alibaba's high-performance LLM inference engine for diverse apps

RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
11

clip-retrieval

Easily compute clip embeddings and build a clip retrieval system

...It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes components for inference, indexing, filtering, and serving results through APIs, making it a complete pipeline for building production-ready retrieval systems. The framework also supports querying by image, text, or embedding, enabling flexible use cases such as reverse image search or multimodal content discovery. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
12

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style.

Downloads: 0 This Week

Last Update: 2026-04-29
See Project
13

RL Games

RL implementations

rl_games is a high-performance reinforcement learning framework optimized for GPU-based training, particularly in environments like robotics and continuous control tasks. It supports advanced algorithms and is built with PyTorch.

Downloads: 0 This Week

Last Update: 2026-02-20
See Project
14

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model

...The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).

Downloads: 6 This Week

Last Update: 2025-09-26
See Project
15

Kubeflow Trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

...One of its key innovations is the integration of MPI-based distributed computing within Kubernetes, allowing efficient communication between nodes for high-performance training. It also includes advanced scheduling capabilities through integrations with tools like Kueue and Volcano, enabling topology-aware resource allocation and multi-cluster job orchestration.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
16

GPT4All

Run Local LLMs on Any Device. Open-source

GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This...

1 Review

Downloads: 130 This Week

Last Update: 2025-03-17
See Project
17

WhisperLive

A nearly-live implementation of OpenAI's Whisper

WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...

Downloads: 9 This Week

Last Update: 2026-03-17
See Project
18

CUDA Agent

Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback. Its architecture combines large-scale data synthesis, a skill-augmented CUDA development environment, and long-horizon reinforcement learning to build intrinsic optimization capability rather than relying on simple post-hoc tuning. ...

Downloads: 0 This Week

Last Update: 2026-03-03
See Project
19

shimmy

Python-free Rust inference server

...This compatibility enables developers to replace remote AI services with locally hosted models while keeping their existing software architecture intact. Shimmy focuses on performance and simplicity, using efficient runtime components to minimize memory usage and startup time compared to heavier inference frameworks. It supports modern model formats such as GGUF and SafeTensors and can automatically discover models stored locally or in common directories used by other AI tools. Advanced capabilities include CPU offloading for Mixture-of-Experts models and GPU acceleration, enabling large models to run on consumer hardware with limited VRAM.

Downloads: 1 This Week

Last Update: 2026-03-11
See Project
20

ANE Training

Training neural networks on Apple Neural Engine via APIs

...The repository implements a from-scratch transformer training pipeline capable of running both forward and backward passes on ANE hardware without relying on CoreML, Metal, or GPU acceleration. It explores the internal software stack of the Apple Neural Engine by interfacing with private classes such as _ANEClient and compiling custom compute graphs in the MIL format. The project includes performance benchmarks and kernel breakdowns that show how different components of the training loop are distributed between the ANE and CPU. ...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
21

Humanoid-Gym

Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. ...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
22

face.evoLVe

High-Performance Face Recognition Library on PaddlePaddle & PyTorch

face.evoLVe is a high-performance face recognition library designed for research and real-world applications in computer vision. The project provides a comprehensive framework for building and training modern face recognition models using deep learning architectures. It includes components for face alignment, landmark localization, data preprocessing, and model training pipelines that allow developers to construct end-to-end facial recognition systems. The repository supports multiple neural...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
23

DINOv3

Reference PyTorch implementation and models for DINOv3

DINOv3 is the third-generation iteration of Meta’s self-supervised visual representation learning framework, building upon the ideas from DINO and DINOv2. It continues the paradigm of learning strong image representations without labels using teacher–student distillation, but introduces a simplified and more scalable training recipe that performs well across datasets and architectures. DINOv3 removes the need for complex augmentations or momentum encoders, streamlining the pipeline while...

Downloads: 14 This Week

Last Update: 2026-03-30
See Project
24

LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference

LightLLM is a high-performance inference and serving framework designed specifically for large language models, focusing on lightweight architecture, scalability, and efficient deployment. The framework enables developers to run and serve modern language models with significantly improved speed and resource efficiency compared to many traditional inference systems.

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
25

DLRM

An implementation of a deep learning recommendation model (DLRM)

...The architecture combines dense (MLP) and sparse (embedding) branches, then interacts features via dot product or feature interactions before passing through further dense layers to predict click-through, ranking scores, or conversion probabilities. The implementation is optimized for performance at scale, supporting multi-GPU and multi-node execution, quantization, embedding partitioning, and pipelined I/O to feed huge embeddings efficiently. It includes data loaders for standard benchmarks (like Criteo), training scripts, evaluation tools, and capabilities like mixed precision, gradient compression, and memory fusion to maximize throughput.

Downloads: 0 This Week

Last Update: 2026-01-12
See Project