Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "gpu max performance" - Page 4

x

Sort By:

Relevance

Clear All Filters

OS

Linux 388
Windows 346
Mac 336
More...
BSD 122
ChromeOS 121
Mobile Operating Systems 23
Desktop Operating Systems 6
Embedded Operating Systems 1
Server Operating Systems 1

Category

Artificial Intelligence 153
Software Development 106
Multimedia 47
System 47
Business 24
Scientific/Engineering 17
Games 13
Blockchain 6
Database 4
Mobile 3
Security 3
Education 2
Terminals 2
Internet 1
Text Editors 1

License

OSI-Approved Open Source 315
Creative Commons Attribution License 2
Other License 2
Public Domain 1

Translations

English 13
Bengali 1
Chinese (Simplified) 1
Korean 1
More...
Spanish 1

Programming Language

Python 131
C++ 88
C 35
Rust 20
More...
Java 14
JavaScript 13
TypeScript 13
Unix Shell 13
Go 10
Julia 10
ActionScript 8
C# 7
Objective C 4
Assembly 2
CoffeeScript 2
Haskell 2
MATLAB 2
AspectJ 1
Fortran 1
haXe 1
Kotlin 1
Lua 1
PHP 1
Swift 1
Tcl 1

Status

Production/Stable 24
Beta 15
Alpha 7
Mature 3

Showing 388 open source projects for "gpu max performance"

View related business solutions

Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
1

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...

Downloads: 17 This Week

Last Update: 2026-03-25
See Project
2

G-Helper

Lightweight Armoury Crate alternative for Asus laptops and ROG Ally

Small and lightweight Armoury Crate alternative for Asus laptops offering almost same functionality without extra load and unnecessary services. Works with all popular models, such as ROG Zephyrus G14, G15, G16, M16, Flow X13, Flow X16, Flow Z13, DUO, TUF Series, Strix or Scar Series, ProArt, Vivobook, Zenbook, ROG Ally or Ally X and many more.

Downloads: 167 This Week

Last Update: 2026-04-22
See Project
3

Skiko

Kotlin Multiplatform bindings to Skia

...By leveraging Skia’s proven performance and cross-platform consistency, Skiko helps developers write a single graphics pipeline that behaves predictably across environments, simplifying maintenance and reducing platform fragmentation.

Downloads: 24 This Week

Last Update: 3 days ago
See Project
4

UIforETW

User interface for recording and managing ETW traces

UIforETW is a Windows performance tracing companion that wraps the Event Tracing for Windows (ETW) toolchain in an approachable GUI. It standardizes trace collection profiles, launches WPR/xperf with the right providers, and organizes the resulting .etl files for repeatable investigations. The tool streamlines the entire loop—record, annotate, open in WPA/XperfView—so engineers can focus on finding scheduling stalls, I/O bottlenecks, GC pauses, or GPU hitches instead of memorizing command-line incantations. ...

Downloads: 0 This Week

Last Update: 2025-10-10
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
5

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style.

Downloads: 0 This Week

Last Update: 2026-04-29
See Project
6

cuML

RAPIDS Machine Learning Library

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
7

Nuclio

High-Performance Serverless event and data processing platform

Nuclio is an open source and managed serverless platform used to minimize development and maintenance overhead and automate the deployment of data-science-based applications. Real-time performance running up to 400,000 function invocations per second. Portable across low laptops, edge, on-prem and multi-cloud deployments. The first serverless platform supporting GPUs for optimized utilization and sharing. Automated deployment to production in a few clicks from Jupyter notebook. Deploy one of...

Downloads: 3 This Week

Last Update: 2026-04-16
See Project
8

three-d

Makes it simple to draw stuff across platforms (including web)

three-d is a lightweight and modern 3D rendering library written in Rust that targets both native and WebAssembly environments, providing a simple yet powerful abstraction over GPU-based graphics APIs. It is designed to make 3D graphics programming accessible while still offering fine-grained control over rendering pipelines, materials, lighting, and camera systems. The library leverages modern graphics standards such as OpenGL and WebGL to deliver high-performance rendering across platforms, including browsers and desktop applications. ...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
9

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine.

Downloads: 1 This Week

Last Update: 2025-05-28
See Project
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
10

RL Games

RL implementations

rl_games is a high-performance reinforcement learning framework optimized for GPU-based training, particularly in environments like robotics and continuous control tasks. It supports advanced algorithms and is built with PyTorch.

Downloads: 0 This Week

Last Update: 2026-02-20
See Project
11

Lemonade

Lemonade helps users run local LLMs with the highest performance

Lemonade is a local LLM runtime that aims to deliver the highest possible performance on your own hardware by auto-configuring state-of-the-art inference engines for both NPUs and GPUs. The project positions itself as a “local LLM server” you can run on laptops and workstations, abstracting away backend differences while giving you a single place to serve and manage models. Its README emphasizes real-world adoption across startups, research groups, and large companies, signaling a focus on...

Downloads: 12 This Week

Last Update: 2026-04-28
See Project
12

DALI

A GPU-accelerated library containing highly optimized building blocks

...Deep learning applications require complex, multi-stage data processing pipelines that include loading, decoding, cropping, resizing, and many other augmentations. These data processing pipelines, which are currently executed on the CPU, have become a bottleneck, limiting the performance and scalability of training and inference. DALI addresses the problem of the CPU bottleneck by offloading data preprocessing to the GPU. Additionally, DALI relies on its own execution engine, built to maximize the throughput of the input pipeline.

Downloads: 1 This Week

Last Update: 2026-04-16
See Project
13

Infinity

Low-latency REST API for serving text-embeddings

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting all sentence-transformer models and frameworks. Infinity is developed under MIT License. Infinity powers inference behind Gradient.ai and other Embedding API providers.

Downloads: 0 This Week

Last Update: 2025-08-22
See Project
14

Zoo Design Studio

The Zoo Design Studio app

...Users can interact with the system through a familiar point-and-click interface, but every action is translated into code in the underlying modeling language, ensuring consistency between visual and programmatic representations. The application is powered by a GPU-first geometry engine that streams rendered output as video frames, enabling high-performance modeling even when heavy computation is offloaded to remote infrastructure. It uses WebSockets for real-time communication between the client and the modeling engine, allowing immediate feedback and interactive design updates.

Downloads: 7 This Week

Last Update: 23 hours ago
See Project
15

DeSmuME

DeSmuME is a Nintendo DS emulator

In this version we have added support for high-resolution 3D rendering. Try the new “GPU Scaling Factor” feature to increase the 3D resolution beyond the native resolution of 256×192 pixels. Also, the Cocoa frontend sees continued radical enhancements and while the Windows frontend sees some new incremental enhancements. DeSmuME is a very CPU demanding app. While many users will see DeSmuME as a toy (and use it as such), it is actually a very sophisticated piece of software with lots of...

Downloads: 14 This Week

Last Update: 2024-08-23
See Project
16

Spartan Engine

A game engine with an emphasis on real-time cutting-edge solutions

...The engine implements a wide range of advanced graphics features, such as atmospheric scattering, physically based shading, screen-space shadows and ambient occlusion, screen-space reflections, sophisticated shadow mapping, volumetric fog, and HDR output. It supports next-gen performance and image quality technologies including variable rate shading, dynamic resolution scaling, temporal anti-aliasing, and upscaling via XeSS 2 and FSR 3. Beyond rendering, SpartanEngine offers PhysX-powered physics, CPU and GPU profiling, a thread pool for parallel workloads.

Downloads: 7 This Week

Last Update: 2 days ago
See Project
17

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model

...The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).

Downloads: 6 This Week

Last Update: 2025-09-26
See Project
18

ProjectPSX

Experimental C# Playstation Emulator

ProjectPSX is an experimental PlayStation 1 emulator written entirely in C# that focuses on learning and exploring console hardware through a clean and dependency-free implementation. The project emulates key components of the PS1 architecture, including the MIPS R3000A CPU, GPU, DMA controller, and CD-ROM subsystem, all interconnected through a custom bus system. Unlike production-grade emulators, ProjectPSX is intentionally designed as a transparent and educational codebase, prioritizing simplicity and readability over full compatibility or performance optimization. It includes a software-based rasterizer for rendering 3D graphics and supports features such as memory cards, basic BIOS functionality, and controller input mapping. ...

Downloads: 2 This Week

Last Update: 2026-04-07
See Project
19

Kubeflow Trainer

Distributed AI Model Training and LLM Fine-Tuning on Kubernetes

...One of its key innovations is the integration of MPI-based distributed computing within Kubernetes, allowing efficient communication between nodes for high-performance training. It also includes advanced scheduling capabilities through integrations with tools like Kueue and Volcano, enabling topology-aware resource allocation and multi-cluster job orchestration.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
20

Habitat-Sim

A flexible, high-performance 3D simulator for Embodied AI research

Habitat-Sim is a high-performance 3D simulator for embodied AI research, designed to run photorealistic indoor environments at thousands of frames per second. It offers GPU-accelerated rendering and a flexible sensor suite—RGB, depth, semantic segmentation, and more—so agents can perceive and act in realistic scenes. The engine is written in C++ with Python bindings and integrates physics, navigation meshes, and shortest-path planners to support tasks like point-goal navigation, rearrangement, and interactive manipulation. ...

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
21

FurMark

GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux

FurMark is an intensive benchmarking tool designed to evaluate the performance of graphics cards using fur rendering algorithms. This tool is particularly effective in generating high workloads that can significantly increase the temperature of the GPU, making it a useful utility for testing the stability and stress tolerance of graphics cards. By simulating demanding rendering tasks, FurMark serves as a comprehensive test for assessing the robustness and thermal performance of GPUs under extreme conditions. ...

Downloads: 311 This Week

Last Update: 2024-10-28
See Project
22

Taichi

Productive, portable, and performant GPU programming in Python

Taichi is an open-source, embedded DSL within Python designed for high-performance numerical and physical simulations. It uses JIT compilation (via LLVM and its runtime TiRT) to offload compute-heavy code to CPUs, GPUs, mobile devices, and embedded systems. With built-in support for sparse data structures (SNode), automatic differentiation, AOT deployment, and compatibility with CUDA, Vulkan, Metal, and OpenGL ES, it empowers disciplines like simulation, graphics, AI, and robotics

Downloads: 0 This Week

Last Update: 2025-07-30
See Project
23

OBS Linux Vulkan/OpenGL game capture

OBS Linux Vulkan/OpenGL game capture

obs-vkcapture is a Vulkan layer and OBS Studio plugin that enables capturing of Vulkan-rendered content in real time, solving a long-standing limitation in game recording and streaming. It works by injecting a Vulkan layer into applications, intercepting rendering calls, and redirecting frame data to OBS Studio without requiring special in-game support. This is particularly useful for modern Vulkan-based games or tools that lack native screen capture hooks. It’s lightweight, efficient, and...

Downloads: 3 This Week

Last Update: 2026-02-22
See Project
24

VisPy

Main repository for Vispy

Vispy is an open-source, high-performance interactive visualization library in Python, designed for creating scientific visualizations and interactive plots. It leverages the power of modern Graphics Processing Units (GPUs) through OpenGL to render large datasets efficiently. Vispy supports a wide range of visualization types, including 2D plots, 3D visualizations, volume rendering, and more, making it suitable for scientific research, data analysis, and educational purposes.

Downloads: 0 This Week

Last Update: 2026-01-07
See Project
25

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models

Gemma.cpp is a C++ implementation for running inference with Gemma models efficiently on CPUs and GPUs. Developed by Google, it allows running large language models (LLMs) like Gemma with minimal hardware, focusing on optimized performance and low latency. Gemma.cpp is intended for developers seeking to deploy LLMs in production environments without needing massive computational resources.

Downloads: 0 This Week

Last Update: 2025-03-25
See Project

Previous
1
2
3
You're on page 4
5
6
7
8
Next

Related Searches

furmark

games

emulator

nvidia

asus

ai

nintendo ds emulator chromebook

desmume

space engine

playstation emulator for chromebook

Related Categories

Artificial Intelligence

Software Development

Multimedia

System

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise