Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Search Results

Search Results for "gpu max performance" - Page 2

x

Sort By:

Relevance

Clear All Filters

OS

Linux 388
Windows 346
Mac 336
More...
BSD 122
ChromeOS 121
Mobile Operating Systems 23
Desktop Operating Systems 6
Embedded Operating Systems 1
Server Operating Systems 1

Category

Artificial Intelligence 153
Software Development 106
Multimedia 47
System 47
Business 24
Scientific/Engineering 17
Games 13
Blockchain 6
Database 4
Mobile 3
Security 3
Education 2
Terminals 2
Internet 1
Text Editors 1

License

OSI-Approved Open Source 315
Creative Commons Attribution License 2
Other License 2
Public Domain 1

Translations

English 13
Bengali 1
Chinese (Simplified) 1
Korean 1
More...
Spanish 1

Programming Language

Python 131
C++ 88
C 35
Rust 20
More...
Java 14
JavaScript 13
TypeScript 13
Unix Shell 13
Go 10
Julia 10
ActionScript 8
C# 7
Objective C 4
Assembly 2
CoffeeScript 2
Haskell 2
MATLAB 2
AspectJ 1
Fortran 1
haXe 1
Kotlin 1
Lua 1
PHP 1
Swift 1
Tcl 1

Status

Production/Stable 24
Beta 15
Alpha 7
Mature 3

Showing 388 open source projects for "gpu max performance"

View related business solutions

Linux Clear Filters & Widen Search

Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
1

FlashAttention

Fast and memory-efficient exact attention

FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations.

Downloads: 67 This Week

Last Update: 2026-03-18
See Project
2

Beta9

Run serverless GPU workloads with fast cold starts on bare-metal

beta9 is a platform that enables running serverless GPU workloads with fast cold starts on bare-metal servers globally. It allows developers to deploy and scale GPU-accelerated applications without managing underlying infrastructure, offering flexibility and efficiency for AI and high-performance computing tasks. beta9 supports various frameworks and provides tools for monitoring and managing deployments effectively.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
3

CUDA Python

Performance meets Productivity

CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools. The project is designed to simplify GPU programming by offering Pythonic abstractions while still exposing the full power of...

Downloads: 2 This Week

Last Update: 2026-04-27
See Project
4

NVIDIA Warp

A Python framework for accelerated simulation, data generation

NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. ...

Downloads: 1 This Week

Last Update: 3 days ago
See Project
Earn up to 16% annual interest with Nexo.
Access competitive interest rates on your digital assets.

Generate interest, borrow against your crypto, and trade a range of cryptocurrencies — all in one platform. Geographic restrictions, eligibility, and terms apply.

Get started with Nexo.
5

ChefKiss Inferno

Emulating Apple Silicon devices

Inferno by ChefKissInc is a low-level systems project focused on enabling hardware acceleration and advanced graphics compatibility on Apple Silicon devices, particularly within unsupported or experimental environments. It is designed to bridge gaps between macOS hardware capabilities and software ecosystems that traditionally rely on different GPU architectures, such as those found in Linux or Windows environments. The project typically operates at the intersection of kernel extensions, GPU drivers, and virtualization layers, aiming to unlock performance features that are otherwise restricted or unavailable. Inferno is especially relevant for developers working on emulation, virtualization, or cross-platform graphics stacks, as it attempts to expose native GPU functionality in unconventional contexts. ...

Downloads: 4 This Week

Last Update: 2026-04-29
See Project
6

uzu

A high-performance inference engine for AI models

...The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. By utilizing Apple’s unified memory architecture, uzu reduces memory copying overhead and improves inference throughput for local AI workloads. The system includes a simple high-level API that enables developers to run models, create inference sessions, and generate outputs with minimal configuration.

Downloads: 1 This Week

Last Update: 11 hours ago
See Project
7

Butterchurn

Butterchurn is a WebGL implementation of the Milkdrop Visualizer

...The project emphasizes both artistic expression and technical performance, offering a balance between visual complexity and efficiency.

Downloads: 5 This Week

Last Update: 2026-04-20
See Project
8

OptiScaler

OptiScaler bridges upscaling/frame gen across GPUs

...The tool effectively acts as a compatibility layer between the game engine and multiple upscaling frameworks, enabling cross-GPU access to features that might otherwise be restricted to specific hardware ecosystems. In addition to replacing upscalers, OptiScaler can enable frame generation features in titles that do not officially support them, improving frame rates and perceived smoothness during gameplay.

Downloads: 197 This Week

Last Update: 2026-04-27
See Project
9

Newton

An open-source, GPU-accelerated physics simulation engine

Newton is a high-performance, GPU-accelerated physics simulation engine designed primarily for robotics research, machine learning, and advanced simulation workflows. Built on top of NVIDIA Warp, it leverages GPU parallelism to deliver scalable and efficient simulation environments that support rapid iteration and experimentation. The engine extends previous simulation frameworks by introducing differentiable physics capabilities, allowing it to integrate seamlessly with machine learning models and optimization pipelines. ...

Downloads: 0 This Week

Last Update: 2026-04-13
See Project
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
10

Flux.jl

Relax! Flux is the ML library that doesn't make you tensor

Flux is an elegant approach to machine learning. It's a 100% pure Julia stack and provides lightweight abstractions on top of Julia's native GPU and AD support. Flux makes the easy things easy while remaining fully hackable. Flux provides a single, intuitive way to define models, just like mathematical notation. Julia transparently compiles your code, optimizing and fusing kernels for the GPU, for the best performance. Existing Julia libraries are differentiable and can be incorporated directly into Flux models. ...

Downloads: 0 This Week

Last Update: 2026-04-17
See Project
11

NVIDIA cuOpt

GPU accelerated decision optimization

...The platform provides multiple interfaces, including C, Python, and server APIs, allowing developers to integrate optimization capabilities into applications and services. cuOpt is designed for high-performance environments and can be deployed across cloud, hybrid, or on-premise infrastructures. By combining GPU acceleration with scalable APIs, cuOpt enables organizations to solve large optimization challenges in logistics, operations research, and decision-making systems.

Downloads: 0 This Week

Last Update: 2026-04-09
See Project
12

Triton

Development repository for the Triton language and compiler

...The project leverages LLVM and MLIR to compile code into efficient GPU instructions, supporting both NVIDIA and AMD hardware. It is widely used in research and production environments where custom tensor operations are required, offering both high performance and developer-friendly syntax.

Downloads: 0 This Week

Last Update: 2026-03-20
See Project
13

luma.gl

High-performance Toolkit for WebGL-based data visualization

luma.gl is a GPU toolkit for the Web-focused primarily on data visualization use cases. luma.gl aims to provide support for GPU programmers that need to work directly with shaders and want a low abstraction API that remains conceptually close to the WebGPU and WebGL APIs. Unlike other common WebGL APIs, the developer can choose to use the parts of luma.gl that support their use case and leave the others behind. While generic enough to be used for general 3D rendering, luma.gl's mandate is...

Downloads: 0 This Week

Last Update: 2026-04-21
See Project
14

GPUStack

Performance-optimized AI inference on your GPUs

GPUStack is an open-source GPU cluster management platform designed to simplify the deployment and operation of artificial intelligence models across heterogeneous hardware environments. The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure. Instead of requiring complex orchestration systems such as Kubernetes, GPUStack provides a...

Downloads: 1 This Week

Last Update: 2026-04-21
See Project
15

LTX-2

Python inference and LoRA trainer package for the LTX-2 audio–video

LTX-2 is a powerful, open-source toolkit developed by Lightricks that provides a modular, high-performance base for building real-time graphics and visual effects applications. It is architected to give developers low-level control over rendering pipelines, GPU resource management, shader orchestration, and cross-platform abstractions so they can craft visually compelling experiences without starting from scratch. Beyond basic rendering scaffolding, LTX-2 includes optimized math libraries, resource loaders, utilities for texture and buffer handling, and integration points for native event loops and input systems. ...

Downloads: 41 This Week

Last Update: 2026-04-23
See Project
16

Meridian

Meridian is an MMM framework

...Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.

Downloads: 9 This Week

Last Update: 11 hours ago
See Project
17

OpenFang

Open-source Agent Operating System

OpenFang is an open-source agent operating system designed to orchestrate autonomous AI agents and workflows in a structured, production-oriented environment. Written primarily in Rust, the project focuses on building a high-performance runtime where multiple specialized agents can collaborate to complete complex computational or development tasks. It aims to move beyond simple chat-based agents by providing infrastructure for persistent agent memory, task coordination, and scalable execution. The system is positioned as a foundation for building advanced AI tooling, particularly in environments that require tight integration with GPU workflows and modern AI pipelines. ...

Downloads: 8 This Week

Last Update: 6 days ago
See Project
18

Megatron-LM

Ongoing research training transformer models at scale

Megatron-LM is a GPU-optimized deep learning framework from NVIDIA designed to train extremely large transformer-based language models efficiently at scale. The repository provides both a reference training implementation and Megatron Core, a composable library of high-performance building blocks for custom large-model pipelines. It supports advanced parallelism strategies including tensor, pipeline, data, expert, and context parallelism, enabling training across massive multi-GPU and multi-node clusters. ...

Downloads: 0 This Week

Last Update: 2026-04-22
See Project
19

webgl-plot

A high-Performance real-time 2D plotting library based on native WebGL

...Its minimal memory footprint and GPU acceleration ensure excellent performance even with tens of thousands of data points, and its simple API allows developers to get started quickly.

Downloads: 1 This Week

Last Update: 2025-03-26
See Project
20

PEFT

State-of-the-art Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...

Downloads: 3 This Week

Last Update: 2026-04-16
See Project
21

CUDA.jl

CUDA programming in Julia

High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher.

Downloads: 1 This Week

Last Update: 7 days ago
See Project
22

Codon

A high-performance, zero-overhead, extensible Python compiler

Codon is a high-performance Python compiler that compiles Python code to native machine code without any runtime overhead. Typical speedups over Python are on the order of 100x or more, on a single thread. Codon supports native multithreading which can lead to speedups many times higher still. The Codon framework is fully modular and extensible, allowing for the seamless integration of new modules, compiler optimizations, domain-specific languages and so on. We actively develop Codon...

Downloads: 12 This Week

Last Update: 2026-03-04
See Project
23

lru-cache

A fast cache that automatically deletes the least recently used items

...It offers flexible configuration options such as max size limits, time based expiration, and custom disposal logic. Developers can use it to cache expensive computations, API responses, or frequently accessed data. The implementation focuses on correctness, speed, and compatibility with modern Node.js environments. Overall, node-lru-cache provides a reliable building block for performance optimization in JavaScript backends.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
24

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

HeavyDB is an open-source GPU-accelerated analytical database designed to perform extremely fast queries on large datasets. The system is built as a SQL-based relational columnar database engine that leverages modern hardware parallelism, including GPUs and multicore CPUs. Its architecture allows users to query datasets containing billions of rows in milliseconds without requiring traditional indexing, pre-aggregation, or sampling techniques. HeavyDB was originally developed as part of the...

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
25

LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

...These capabilities aim to lower latency, cut GPU cycles, and stabilize performance for production workloads with overlapping prompts or retrieval-augmented contexts. The end result is a cache fabric for LLMs that complements engines rather than replacing them.

Downloads: 0 This Week

Last Update: 2026-04-23
See Project

Previous
1
You're on page 2
3
4
5
6
Next

Related Searches

python compiler

frame generation

nvidia

optiscaler

amd

programming languages

os

operating system

openfang

webgl

Related Categories

Artificial Intelligence

Software Development

Multimedia

System

Business

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise