Join/Login
Business Software
Open Source Software
For Vendors
Blog
About
More

For Vendors Help Create Join Login

Business Software

Open Source Software

SourceForge Podcast

Resources

Articles
Case Studies
Blog

Menu

Help
Create
Join
Login

Home
Open Source Software
Artificial Intelligence Software
Search Results

Search Results for "gpu max performance" - Page 3

x

Sort By:

Relevance

Clear All Filters

OS

Linux 90
Windows 88
Mac 84
More...
ChromeOS 35
BSD 34
Mobile Operating Systems 3

Category

Artificial Intelligence 100
Software Development 9
Multimedia 3
System 2
Games 1
Scientific/Engineering 1

License

OSI-Approved Open Source 89
Creative Commons Attribution License 1

Translations

English 5
Bengali 1

Programming Language

Python 100
C++ 5
Unix Shell 4
JavaScript 2
Go 1
More...
Java 1
Julia 1
Rust 1

Status

Production/Stable 3
Beta 1

Showing 100 open source projects for "gpu max performance"

View related business solutions

Artificial Intelligence Python Clear Filters & Widen Search

Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
8 Monitoring Tools in One APM. Install in 5 Minutes.
Errors, performance, logs, uptime, hosts, anomalies, dashboards, and check-ins. One interface.

AppSignal works out of the box for Ruby, Elixir, Node.js, Python, and more. 30-day free trial, no credit card required.

Start Free
1

DeepSpeed MII

MII makes low-latency and high-throughput inference possible

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed. The Deep Learning (DL) open-source community has seen tremendous growth in the last few months. Incredibly powerful text generation models such as the Bloom 176B, or image generation model such as Stable Diffusion are now available to anyone with access to a handful or even a single GPU through platforms such as Hugging Face. While open-sourcing has democratized access to AI capabilities, their application is...

Downloads: 0 This Week

Last Update: 2025-03-25
See Project
2

autoresearch for AMD

AI agents running research on single-GPU nanochat training

autoresearch for AMD is a framework for autonomous scientific experimentation in machine learning, enabling AI agents to iteratively improve models through a continuous loop of hypothesis generation, experimentation, and evaluation. The system is built around a minimal structure that includes a data preparation module, a training script that can be modified, and a program specification that guides the agent’s decision-making process. During each iteration, the agent edits the training code,...

Downloads: 0 This Week

Last Update: 2026-03-30
See Project
3

TensorRT LLM

TensorRT LLM provides users with an easy-to-use Python API

TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as...

Downloads: 2 This Week

Last Update: 2026-04-16
See Project
4

SageAttention

NeurIPS2025 Spotlight] Quantized Attention

SageAttention is an open-source optimization library designed to accelerate the attention mechanism used in transformer-based neural networks. Since attention operations are often the most computationally expensive component of modern AI models, SageAttention introduces quantization techniques that significantly reduce computational overhead while preserving model accuracy. The system achieves this by using low-precision numerical formats such as INT4, FP8, or INT8 to represent key matrices...

Downloads: 1 This Week

Last Update: 2026-03-08
See Project
Go from Code to Production URL in Seconds
Cloud Run deploys apps in any language instantly. Scales to zero. Pay only when code runs.

Skip the Kubernetes configs. Cloud Run handles HTTPS, scaling, and infrastructure automatically. Two million requests free per month.

Try it free
5

bitnet.cpp

Official inference framework for 1-bit LLMs

bitnet.cpp is the official open-source inference framework and ecosystem designed to enable ultra-efficient execution of 1-bit large language models (LLMs), which quantize most model parameters to ternary values (-1, 0, +1) while maintaining competitive performance with full-precision counterparts. At its core is bitnet.cpp, a highly optimized C++ backend that supports fast, low-memory inference on both CPUs and GPUs, enabling models such as BitNet b1.58 to run without requiring enormous...

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
6

CogAgent

An open sourced end-to-end VLM-based GUI Agent

CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather...

Downloads: 4 This Week

Last Update: 4 days ago
See Project
7

Diffrax

Numerical differential equation solvers in JAX

Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
8

Core ML Tools

Core ML tools contain supporting tools for Core ML model conversion

...Core ML provides a unified representation for all models. Your app uses Core ML APIs and user data to make predictions, and to fine-tune models, all on the user’s device. Core ML optimizes on-device performance by leveraging the CPU, GPU, and Neural Engine while minimizing its memory footprint and power consumption. Running a model strictly on the user’s device removes any need for a network connection, which helps keep the user’s data private and your app responsive.

Downloads: 0 This Week

Last Update: 2025-11-10
See Project
9

BentoML

Unified Model Serving Framework

...Parallelize compute-intense model inference workloads to scale separately from the serving logic. Adaptive batching dynamically groups inference requests for optimal performance. Orchestrate distributed inference graph with multiple models via Yatai on Kubernetes. Easily configure CUDA dependencies for running inference with GPU. Automatically generate docker images for production deployment.

Downloads: 0 This Week

Last Update: 6 hours ago
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

Qwen-VL

Chat & pretrained large vision language model

Qwen-VL is Alibaba Cloud’s vision-language large model family, designed to integrate visual and linguistic modalities. It accepts image inputs (with optional bounding boxes) and text, and produces text (and sometimes bounding boxes) as output. The model variants (VL-Plus, VL-Max, etc.) have been upgraded for better visual reasoning, text recognition from images, fine-grained understanding, and support for high image resolutions / extreme aspect ratios. Qwen-VL supports multilingual inputs...

Downloads: 0 This Week

Last Update: 2025-09-23
See Project
11

Diplomacy Cicero

Code for Cicero, an AI agent that plays the game of Diplomacy

...It supports two variants: Cicero (which handles full “press” negotiation) and Diplodocus (a variant focused on no-press diplomacy) as described in the README. The codebase is implemented primarily in Python with performance-critical components in C++ (via pybind11 bindings) and is configured to run in a high‐GPU cluster environment. Configuration is managed via protobuf files to define tasks such as self-play, benchmark agent comparisons, and RL training. The project is now archived and read-only, reflecting that it is no longer actively developed but remains publicly available for research use.

Downloads: 3 This Week

Last Update: 2 days ago
See Project
12

TorchRec

Pytorch domain library for recommendation systems

...The TorchRec sharder can shard embedding tables with different sharding strategies including data-parallel, table-wise, row-wise, table-wise-row-wise, and column-wise sharding. The TorchRec planner can automatically generate optimized sharding plans for models. Pipelined training overlaps dataloading device transfer (copy to GPU), inter-device communications (input_dist), and computation (forward, backward) for increased performance. Optimized kernels for RecSys powered by FBGEMM. Quantization support for reduced precision training and inference. Common modules for RecSys.

Downloads: 1 This Week

Last Update: 2026-03-15
See Project
13

The SpeechBrain Toolkit

A PyTorch-based Speech Toolkit

...Spectral masking, spectral mapping, and time-domain enhancement are different methods already available within SpeechBrain. Separation methods such as Conv-TasNet, DualPath RNN, and SepFormer are implemented as well. SpeechBrain provides efficient and GPU-friendly speech augmentation pipelines and acoustic features extraction.

Downloads: 1 This Week

Last Update: 2026-03-30
See Project
14

Colab-MCP

An MCP server for interacting with Google Colab

...Instead of relying on manual notebook usage, the system allows MCP-compatible agents to execute code, manage files, install dependencies, and orchestrate entire development workflows within Colab’s cloud infrastructure. This approach bridges the gap between local AI agents and remote high-performance compute environments, allowing users to offload heavy workloads such as machine learning training, data analysis, and dependency-heavy tasks to Colab’s GPU and TPU resources. By exposing Colab as an MCP server, the tool enables seamless integration with a wide range of AI assistants and agent frameworks, creating a standardized interface for tool use and execution.

Downloads: 0 This Week

Last Update: 2026-03-27
See Project
15

TAME LLM

Traditional Mandarin LLMs for Taiwan

TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific...

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
16

Transformer Engine

A library for accelerating Transformer models on NVIDIA GPUs

Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++...

Downloads: 0 This Week

Last Update: 2026-04-24
See Project
17

TensorFlow Model Garden

Models and examples built with TensorFlow

The TensorFlow Model Garden is a repository with a number of different implementations of state-of-the-art (SOTA) models and modeling solutions for TensorFlow users. We aim to demonstrate the best practices for modeling so that TensorFlow users can take full advantage of TensorFlow for their research and product development. To improve the transparency and reproducibility of our models, training logs on TensorBoard.dev are also provided for models to the extent possible though not all models...

Downloads: 0 This Week

Last Update: 2026-02-11
See Project
18

tvm

Open deep learning compiler stack for cpu, gpu, etc.

Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...

Downloads: 0 This Week

Last Update: 2026-02-01
See Project
19

Tracking Any Point (TAP)

DeepMind model for tracking arbitrary points across videos & robotics

TAPNet is the official Google DeepMind repository for Tracking Any Point (TAP), bundling datasets, models, benchmarks, and demos for precise point tracking in videos. The project includes the TAP-Vid and TAPVid-3D benchmarks, which evaluate long-range tracking of arbitrary points in 2D and 3D across diverse real and synthetic videos. Its flagship models—TAPIR, BootsTAPIR, and the latest TAPNext—use matching plus temporal refinement or next-token style propagation to achieve state-of-the-art...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
20

fastai

Deep learning library

fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. It aims to do both things without substantial compromises in ease of use, flexibility, or performance. This is possible thanks to a carefully layered architecture, which expresses common underlying...

Downloads: 0 This Week

Last Update: 2026-02-14
See Project
21

VCClient

Software that uses AI to perform real-time voice conversion

...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.

Downloads: 25 This Week

Last Update: 2026-03-23
See Project
22

DeepSpeed

Deep learning optimization library: makes distributed training easy

DeepSpeed is an easy-to-use deep learning optimization software suite that enables unprecedented scale and speed for Deep Learning Training and Inference. With DeepSpeed you can: 1. Train/Inference dense or sparse models with billions or trillions of parameters 2. Achieve excellent system throughput and efficiently scale to thousands of GPUs 3. Train/Inference on resource constrained GPU systems 4. Achieve unprecedented low latency and high throughput for inference 5. Achieve extreme...

Downloads: 0 This Week

Last Update: 18 hours ago
See Project
23

HunyuanVideo-I2V

A Customizable Image-to-Video Model based on HunyuanVideo

HunyuanVideo-I2V is a customizable image-to-video generation framework developed by Tencent, extending the capabilities of HunyuanVideo. It allows for high-quality video creation from still images, using PyTorch and providing pre-trained model weights, inference code, and customizable training options. The system includes a LoRA training code for adding special effects and enhancing video realism, aiming to offer versatile and scalable solutions for generating videos from static image inputs.

1 Review

Downloads: 6 This Week

Last Update: 2025-03-10
See Project
24

Warlock-Studio

AI Suite for upscaling, interpolating & restoring images/videos

v6.0. Warlock-Studio is a Windows application that uses Real-ESRGAN, BSRGAN, IRCNN, GFPGAN, RealESRNet, RealESRAnime and RIFE Artificial Intelligence models to upscale, restore faces, interpolate frames and reduce noise in images and videos. the application supports GPU acceleration (including multi-GPU setups) and offers batch processing for large workloads. It includes drag-and-drop handling for single or multiple files, optional pre-resize functions, and an automatic tiling system...

Downloads: 28 This Week

Last Update: 2026-02-16
See Project
25

Grok-1

Open-source, high-performance Mixture-of-Experts large language model

Grok-1 is a 314-billion-parameter Mixture-of-Experts (MoE) large language model developed by xAI. Designed to optimize computational efficiency, it activates only 25% of its weights for each input token. In March 2024, xAI released Grok-1's model weights and architecture under the Apache 2.0 license, making them openly accessible to developers. The accompanying GitHub repository provides JAX example code for loading and running the model. Due to its substantial size, utilizing Grok-1...

1 Review

Downloads: 43 This Week

Last Update: 2025-02-27
See Project

Previous
1
2
You're on page 3
4
Next

Related Searches

grok

windows 7 32 bit software

conversational ai

cuda machine learning

deep learning

smart home control panel

image to video

video ai

bitnet.cpp

speech recognition in telugu language

Related Categories

Artificial Intelligence

Software Development

Multimedia

System

Games

SourceForge

Create a Project
Open Source Software
Business Software
Top Downloaded Projects

Company

About
Team
SourceForge Headquarters
1320 Columbia Street Suite 310
San Diego, CA 92101
+1 (858) 422-6466

Resources

Support
Site Documentation
Site Status
SourceForge Reviews

© 2026 Slashdot Media. All Rights Reserved.

Terms Privacy Opt Out Advertise