gpu free download - SourceForge

Showing 86 open source projects for "gpu"

View related business solutions

Software Development Python Clear Filters & Widen Search

Train ML Models With SQL You Already Know
BigQuery automates data prep, analysis, and predictions with built-in AI assistance.

Build and deploy ML models using familiar SQL. Automate data prep with built-in Gemini. Query 1 TB and store 10 GB free monthly.

Try Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

Numba CUDA Target

The CUDA target for Numba

Numba CUDA Target is NVIDIA’s maintained CUDA backend for the Numba JIT compiler, enabling developers to write GPU-accelerated code directly in Python. It allows users to define CUDA kernels using Python syntax, which are then compiled into efficient GPU code at runtime using LLVM-based toolchains. This approach significantly lowers the barrier to entry for GPU programming by eliminating the need to write CUDA C++ while still delivering high performance.

Downloads: 0 This Week

Last Update: 2026-07-03
See Project
2

ProtoMotions

ProtoMotions is a GPU-accelerated simulation and learning framework

ProtoMotions 3 is a GPU-accelerated framework for training physically simulated digital humans and humanoid robots. It provides modular environments and learning components for animation, robotics, and reinforcement learning research. Large motion datasets such as AMASS and BONES can be divided across multiple GPUs for scalable policy training. A built-in PyRoki workflow retargets human motion data to different robot bodies with a single command.

Downloads: 3 This Week

Last Update: 2026-07-26
See Project
3

Triton

Development repository for the Triton language and compiler

...The project leverages LLVM and MLIR to compile code into efficient GPU instructions, supporting both NVIDIA and AMD hardware. It is widely used in research and production environments where custom tensor operations are required, offering both high performance and developer-friendly syntax.

Downloads: 6 This Week

Last Update: 2026-06-18
See Project
4

CuPy

A NumPy-compatible array library accelerated by CUDA

CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. CuPy is very easy to install through pip or through precompiled binary packages called wheels for recommended environments. ...

Downloads: 1 This Week

Last Update: 2026-06-01
See Project
Ship Agents Faster
Transform your applications and workflows into powerful agentic systems at global scale.

Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.

Get Started Free
5

DeepSeed

Deep learning optimization library making distributed training easy

...DeepSpeed delivers extreme-scale model training for everyone, from data scientists training on massive supercomputers to those training on low-end clusters or even on a single GPU. Using current generation of GPU clusters with hundreds of devices, 3D parallelism of DeepSpeed can efficiently train deep learning models with trillions of parameters. With just a single GPU, ZeRO-Offload of DeepSpeed can train models with over 10B parameters, 10x bigger than the state of arts, democratizing multi-billion-parameter model training such that many deep learning scientists can explore bigger and better models. ...

Downloads: 2 This Week

Last Update: 13 hours ago
See Project
6

Video-subtitle-extractor

A GUI tool for extracting hard-coded subtitle (hardsub) from videos

...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model. Filter the text in the non-subtitle area and remove the watermark (station logo) text.

1 Review

Downloads: 54 This Week

Last Update: 2026-04-05
See Project
7

CUDA Python

Performance meets Productivity

CUDA Python is a unified Python interface for accessing and working with the NVIDIA CUDA platform, enabling developers to build GPU-accelerated applications entirely in Python. It acts as a metapackage composed of multiple submodules that provide both high-level and low-level access to CUDA functionality, including runtime APIs, driver APIs, and JIT compilation tools. The project is designed to simplify GPU programming by offering Pythonic abstractions while still exposing the full power of CUDA for advanced users. ...

Downloads: 0 This Week

Last Update: 2026-07-21
See Project
8

ComfyUI

The most powerful and modular diffusion model GUI, api and backend

The most powerful and modular diffusion model is GUI and backend. This UI will let you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart-based interface. We are a team dedicated to iterating and improving ComfyUI, supporting the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation. Open source AI models will win in the long run against closed models and we are only at the beginning. Our core mission...

Downloads: 851 This Week

Last Update: 3 days ago
See Project
9

MuJoCo Playground

An open source library for GPU-accelerated robot learning

MuJoCo Playground, developed by Google DeepMind, is a GPU-accelerated suite of simulation environments for robot learning and sim-to-real research, built on top of MuJoCo MJX. It unifies a range of control, locomotion, and manipulation tasks into a consistent and scalable framework optimized for JAX and Warp backends. The project includes classic control benchmarks from dm_control, advanced quadruped and bipedal locomotion systems, and dexterous as well as non-prehensile manipulation setups. ...

Downloads: 0 This Week

Last Update: 2026-04-28
See Project
$300 Free Credits for Your Google Cloud Projects
Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.

Start Free Trial
10

DeepEP

DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP addresses that by providing optimized GPU kernels and efficient dispatch/combining logic. ...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
11

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.

Downloads: 1 This Week

Last Update: 2 days ago
See Project
12

DGL

Python package built to ease deep learning on graph

...We also want to make the combination of graph based modules and tensor based modules (PyTorch or MXNet) as smooth as possible. DGL provides a powerful graph object that can reside on either CPU or GPU. It bundles structural data as well as features for a better control. We provide a variety of functions for computing with graph objects including efficient and customizable message passing primitives for Graph Neural Networks.

Downloads: 0 This Week

Last Update: 2024-08-29
See Project
13

NVIDIA Warp

A Python framework for accelerated simulation, data generation

NVIDIA Warp is a high-performance Python framework developed by NVIDIA for building and accelerating simulation, graphics, and physics-based workloads using GPU computing. It enables developers to write kernel-level code in Python that is automatically compiled into efficient CUDA kernels, combining ease of use with near-native performance. The framework is designed for applications such as robotics, reinforcement learning, physical simulation, and differentiable computing, where performance and flexibility are critical. ...

Downloads: 0 This Week

Last Update: 3 days ago
See Project
14

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

...The repo typically includes instructions for converting models to TensorRT engines and for wiring those engines into ComfyUI nodes. This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
15

Meridian

Meridian is an MMM framework

...The framework provides a robust foundation for constructing in-house MMM pipelines capable of handling both national and geo-level data, with built-in support for calibration using experimental data or prior knowledge. Meridian uses the No-U-Turn Sampler (NUTS) for Markov Chain Monte Carlo (MCMC) sampling to produce statistically rigorous results, and it includes GPU acceleration to significantly reduce computation time.

Downloads: 0 This Week

Last Update: 5 hours ago
See Project
16

DualPipe

A bidirectional pipeline parallelism algorithm

DualPipe is a bidirectional pipeline parallelism algorithm open-sourced by DeepSeek, introduced in their DeepSeek-V3 technical framework. The main goal of DualPipe is to maximize overlap between computation and communication phases during distributed training, thus reducing idle GPU time (i.e. “pipeline bubbles”) and improving cluster efficiency. Traditional pipeline parallelism methods (e.g. 1F1B or staggered pipelining) leave gaps because forward and backward phases can’t fully overlap with communication. DualPipe addresses that by scheduling micro-batches from both ends of the pipeline in a bidirectional fashion—i.e. some micro-batches flow forward while others flow backward—so that computation on one partition can coincide with communication for another.

Downloads: 0 This Week

Last Update: 2025-12-25
See Project
17

RLax

Library of JAX-based building blocks for reinforcement learning agents

...It supports both on-policy and off-policy learning, as well as value-based, policy-based, and model-based approaches. RLax is fully JIT-compilable with JAX, enabling high-performance execution across CPU, GPU, and TPU backends. The library implements tools for Bellman equations, return distributions, general value functions, and policy optimization in both continuous and discrete action spaces. It integrates seamlessly with DeepMind’s Haiku (for neural network definition) and Optax (for optimization), making it a key component in modular RL pipelines.

Downloads: 0 This Week

Last Update: 2026-06-13
See Project
18

Theseus

A library for differentiable nonlinear optimization

...Because solves are differentiable, you can backpropagate through optimization to learn cost weights, feature extractors, or initialization networks end-to-end. The implementation supports batched optimization on GPU, robust losses, damping strategies, and custom factors, making it practical for real-time systems. Helper packages provide geometry primitives and utilities for composing priors, relative constraints, and measurement models. Theseus bridges the gap between classical optimization and deep learning, enabling hybrid systems that learn components.

Downloads: 0 This Week

Last Update: 2025-10-07
See Project
19

KServe

Standardized Serverless ML Inference Platform on Kubernetes

...It aims to solve production model serving use cases by providing performant, high abstraction interfaces for common ML frameworks like Tensorflow, XGBoost, ScikitLearn, PyTorch, and ONNX. It encapsulates the complexity of autoscaling, networking, health checking, and server configuration to bring cutting edge serving features like GPU Autoscaling, Scale to Zero, and Canary Rollouts to your ML deployments. It enables a simple, pluggable, and complete story for Production ML Serving including prediction, pre-processing, post-processing and explainability. KServe is being used across various organizations.

Downloads: 0 This Week

Last Update: 2026-07-15
See Project
20

Face Alignment

2D and 3D Face alignment library build using pytorch

...However, the users can alternatively use dlib, BlazeFace, or pre-existing ground truth bounding boxes. While not required, for optimal performance(especially for the detector) it is highly recommended to run the code using a CUDA-enabled GPU. While here the work is presented as a black box, if you want to know more about the intrisecs of the method please check the original paper either on arxiv or my webpage.

Downloads: 0 This Week

Last Update: 2026-04-06
See Project
21

Kornia

Open Source Differentiable Computer Vision Library

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer vision problems. At its core, the package uses PyTorch as its main backend both for efficiency and to take advantage of the reverse-mode auto-differentiation to define and compute the gradient of complex functions. Inspired by existing packages, this library is composed by a subset of packages containing operators that can be inserted within...

Downloads: 1 This Week

Last Update: 2026-05-19
See Project
22

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...

Downloads: 1 This Week

Last Update: 15 hours ago
See Project
23

Numba

NumPy aware dynamic Python compiler using LLVM

Numba is an open source JIT compiler that translates a subset of Python and NumPy code into fast machine code. Numba translates Python functions to optimized machine code at runtime using the industry-standard LLVM compiler library. Numba-compiled numerical algorithms in Python can approach the speeds of C or FORTRAN. You don't need to replace the Python interpreter, run a separate compilation step, or even have a C/C++ compiler installed. Just apply one of the Numba decorators to your...

Downloads: 17 This Week

Last Update: 2026-07-01
See Project
24

FairChem

FAIR Chemistry's library of machine learning methods for chemistry

...Tasks span heterogeneous domains—catalysis (OC20-style), inorganic materials (OMat), molecules (OMol), MOFs (ODAC), and molecular crystals (OMC)—allowing one model family to serve many simulations. The README provides quick paths for pulling models (e.g., via Hugging Face access), then running energy/force predictions on GPU or CPU.

Downloads: 0 This Week

Last Update: 2026-06-08
See Project
25

DocTR

Library for OCR-related tasks powered by Deep Learning

DocTR provides an easy and powerful way to extract valuable information from your documents. Seemlessly process documents for Natural Language Understanding tasks: we provide OCR predictors to parse textual information (localize and identify each word) from your documents. Robust 2-stage (detection + recognition) OCR predictors with pretrained parameters. User-friendly, 3 lines of code to load a document and extract text with a predictor. State-of-the-art performances on public document...

Downloads: 5 This Week

Last Update: 2026-07-29
See Project