Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "gpu max performance"

x

Sort By:

Relevance

Clear All Filters

OS

BSD 122
Linux 122
Mac 119
More...
Windows 119
ChromeOS 118
Desktop Operating Systems 4
Embedded Operating Systems 1
Mobile Operating Systems 1
Server Operating Systems 1

Category

Artificial Intelligence 63
Software Development 40
Multimedia 9
Games 6
Scientific/Engineering 5
Business 3
Education 2
System 2
Database 1
Internet 1
Mobile 1
Terminals 1

License

OSI-Approved Open Source 97

Translations

English 7
Chinese (Simplified) 1

Programming Language

Python 45
C++ 23
Rust 8
JavaScript 7
More...
ActionScript 6
TypeScript 6
Java 4
C 3
Objective C 3
Unix Shell 3
C# 2
CoffeeScript 2
Go 2
Julia 2
Assembly 1
Haskell 1
Tcl 1

Status

Production/Stable 7
Beta 5
Alpha 3
Mature 1

122 projects for "gpu max performance" with 1 filter applied:

BSD Clear Filters & Widen Search

Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

GPU Hot

Real-time NVIDIA GPU dashboard

GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser.

Downloads: 1 This Week

Last Update: 2026-04-11
See Project
2

how-to-optim-algorithm-in-cuda

How to optimize some algorithm in cuda

...These examples show how different optimization techniques influence performance on modern GPU hardware and allow readers to experiment with real implementations. The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
3

Alacritty

A cross-platform, GPU-accelerated terminal emulator

Alacritty is the fastest open source terminal emulator there is. How is it the fastest? With such a strong focus on simplicity and performance, Alacritty’s included features are very carefully considered, ensuring that it remains blazingly fast. It’s got a GPU for rendering that makes a whole lot of optimizations possible. In various benchmarked terminals, Alacritty has shown to be either faster, or way faster than others. Alacritty requires no additional setup, but still allows configuration of many aspects of the terminal. ...

Downloads: 2 This Week

Last Update: 2026-04-06
See Project
4

llmfit

157 models, 30 providers, one command to find what runs on hardware

llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...

Downloads: 19 This Week

Last Update: 22 hours ago
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
5

NVIDIA Profile Inspector

Modify game profiles inside the internal driver database

NVIDIA Profile Inspector is a specialized utility that allows users to view and modify hidden graphics driver settings within NVIDIA’s internal profile database, providing deeper control than the official NVIDIA Control Panel. It exposes advanced and undocumented configuration options that can influence rendering behavior, performance optimization, and compatibility for specific games. Users can create, edit, and assign profiles for individual applications, enabling fine-grained tuning of GPU behavior beyond standard settings. The tool is particularly popular among enthusiasts who want to optimize performance, troubleshoot graphical issues, or enable experimental features such as custom DLSS configurations. ...

Downloads: 59 This Week

Last Update: 2026-03-20
See Project
6

XenosRecomp

A tool for converting Xbox 360 shaders to HLSL

...The project addresses one of the most complex aspects of console reverse engineering, which is accurately reproducing proprietary GPU behavior in a portable and efficient way. By reconstructing the graphics pipeline, XenosRecomp enables developers to render scenes correctly without relying on emulation layers that can introduce performance overhead or inaccuracies.

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
7

Flash-MoE

Running a big model on a small laptop

...It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.

Downloads: 1 This Week

Last Update: 2026-04-02
See Project
8

Numba CUDA Target

The CUDA target for Numba

Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance.

Downloads: 2 This Week

Last Update: 2026-04-30
See Project
9

uzu

A high-performance inference engine for AI models

...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.

Downloads: 1 This Week

Last Update: 11 hours ago
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

Butterchurn

Butterchurn is a WebGL implementation of the Milkdrop Visualizer

...The project emphasizes both artistic expression and technical performance, offering a balance between visual complexity and efficiency.

Downloads: 5 This Week

Last Update: 2026-04-20
See Project
11

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...

Downloads: 41 This Week

Last Update: 2026-04-23
See Project
12

lru-cache

A fast cache that automatically deletes the least recently used items

...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
13

webgl-plot

A high-Performance real-time 2D plotting library based on native WebGL

...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.

Downloads: 1 This Week

Last Update: 2025-03-26
See Project
14

Megatron-LM

Ongoing research training transformer models at scale

Megatron-LM is a GPU-optimized deep learning framework from NVIDIA designed to train extremely large transformer-based language models efficiently at scale. The repository provides both a reference training implementation and Megatron Core, a composable library of high-performance building blocks for custom large-model pipelines. It supports advanced parallelism strategies including tensor, pipeline, data, expert, and context parallelism, enabling training across massive multi-GPU and multi-node clusters. ...

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
15

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
16

libplacebo

Official mirror of libplacebo

libplacebo is a flexible, high-performance graphics library built on top of Vulkan, designed to provide reusable GPU-accelerated components for media applications. It originated as a core part of the rendering pipeline for the mpv media player and has since grown into a standalone library used for tone mapping, dithering, color space conversion, and more. libplacebo is ideal for developers looking to integrate sophisticated video rendering and post-processing into their own applications with full control over shaders and rendering stages.

Downloads: 1 This Week

Last Update: 2026-03-13
See Project
17

UCCL

UCCL is an efficient communication library for GPUs

UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures. ...

Downloads: 0 This Week

Last Update: 2026-03-14
See Project
18

KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

...In large language models, the key-value cache stores intermediate attention states that enable efficient token generation during inference, but these caches can consume large amounts of GPU memory when handling long contexts. KVCache-Factory provides a platform for implementing and evaluating multiple compression strategies that reduce memory usage while preserving model performance. The framework integrates several state-of-the-art methods such as PyramidKV, SnapKV, H2O, and StreamingLLM, allowing researchers to compare and experiment with different approaches within the same environment. ...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
19

RTP-LLM

Alibaba's high-performance LLM inference engine for diverse apps

RTP-LLM is an open-source large language model inference acceleration engine developed by Alibaba to provide high-performance serving infrastructure for modern LLM deployments. The system focuses on improving throughput, latency, and resource utilization when running large models in production environments. It achieves this by implementing optimized GPU kernels, batching strategies, and memory management techniques tailored for transformer inference workloads.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
20

clip-retrieval

Easily compute clip embeddings and build a clip retrieval system

...It allows developers to compute embeddings for both images and text efficiently and then index them for fast similarity search across massive datasets. The system is optimized for performance and scalability, capable of processing tens or even hundreds of millions of embeddings using GPU acceleration. It includes components for inference, indexing, filtering, and serving results through APIs, making it a complete pipeline for building production-ready retrieval systems. The framework also supports querying by image, text, or embedding, enabling flexible use cases such as reverse image search or multimodal content discovery. ...

Downloads: 1 This Week

Last Update: 2026-03-18
See Project
21

XFrames

GPU-accelerated GUI development for Node.js and the browser

xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.

Downloads: 0 This Week

Last Update: 2024-12-07
See Project
22

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style.

Downloads: 0 This Week

Last Update: 2026-04-29
See Project
23

Zoo Design Studio

The Zoo Design Studio app

...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.

Downloads: 8 This Week

Last Update: 18 hours ago
See Project
24

RL Games

RL implementations

rl_games is a high-performance reinforcement learning framework optimized for GPU-based training, particularly in environments like robotics and continuous control tasks. It supports advanced algorithms and is built with PyTorch.

Downloads: 0 This Week

Last Update: 2026-02-20
See Project
25

Spartan Engine

A game engine with an emphasis on real-time cutting-edge solutions

...The engine implements a wide range of advanced graphics features, such as atmospheric scattering, physically based shading, screen-space shadows and ambient occlusion, screen-space reflections, sophisticated shadow mapping, volumetric fog, and HDR output. It supports next-gen performance and image quality technologies including variable rate shading, dynamic resolution scaling, temporal anti-aliasing, and upscaling via XeSS 2 and FSR 3. Beyond rendering, SpartanEngine offers PhysX-powered physics, CPU and GPU profiling, a thread pool for parallel workloads.

Downloads: 7 This Week

Last Update: 1 day ago
See Project

Previous
You're on page 1
2
3
4
5
Next

Related Searches

games

gpu

alacritty

graphics driver blender

webgl

space engine

real-time global intelligence dashboard

gpu max performance

eal-time global intelligence dashboard

windows emulator for chromeos

Related Categories

Artificial Intelligence

Software Development

Multimedia

Games

Scientific/Engineering

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise