gpu faster free download

Showing 24 open source projects for "gpu faster"

View related business solutions

Software Development Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
1

DeepSeed

Deep learning optimization library making distributed training easy

...Sparse attention of DeepSpeed powers an order-of-magnitude longer input sequence and obtains up to 6x faster execution comparing with dense transformers.

Downloads: 0 This Week

Last Update: 2026-03-13
See Project
2

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

...This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
3

Video-subtitle-extractor

A GUI tool for extracting hard-coded subtitle (hardsub) from videos

...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model. Filter the text in the non-subtitle area and remove the watermark (station logo) text.

1 Review

Downloads: 54 This Week

Last Update: 2025-05-13
See Project
4

DeepEP

DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Meridian

Meridian is an MMM framework

Meridian is a comprehensive, open source marketing mix modeling (MMM) framework developed by Google to help advertisers analyze and optimize the impact of their marketing investments. Built on Bayesian causal inference principles, Meridian enables organizations to evaluate how different marketing channels influence key performance indicators (KPIs) such as revenue or conversions while accounting for external factors like seasonality or economic trends. The framework provides a robust...

Downloads: 0 This Week

Last Update: 12 hours ago
See Project
6

XFrames

GPU-accelerated GUI development for Node.js and the browser

xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.

Downloads: 0 This Week

Last Update: 2024-12-07
See Project
7

MNN

MNN is a blazing fast, lightweight deep learning framework

MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models, and has industry leading performance for inference and training on-device. At present, MNN has been integrated in more than 20 apps of Alibaba Inc, such as Taobao, Tmall, Youku, Dingtalk, Xianyu and etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity...

Downloads: 6 This Week

Last Update: 2026-03-05
See Project
8

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...

Downloads: 29 This Week

Last Update: 2026-01-13
See Project
9

AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero

Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind's AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. We believe that this methodology can have exciting applications in many different research areas. Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++)...

Downloads: 20 This Week

Last Update: 2025-12-12
See Project
Our Free Plans just got better! | Auth0
With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.

Try free now
10

TensorRT

C++ library for high performance inference on NVIDIA GPUs

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40X faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and deploy to hyperscale data centers,...

Downloads: 10 This Week

Last Update: 1 day ago
See Project
11

Isaac ROS Visual SLAM

Visual SLAM/odometry package based on NVIDIA-accelerated cuVSLAM

Discover a faster, easier way to build advanced AI robotics applications with the NVIDIA Isaac™ ROS collection of accelerated computing packages and AI models, bringing NVIDIA acceleration to ROS developers everywhere. Isaac ROS Visual SLAM provides a high-performance, best-in-class ROS 2 package for VSLAM (visual simultaneous localization and mapping).

Downloads: 1 This Week

Last Update: 3 days ago
See Project
12

ProbabilisticCircuits.jl

Probabilistic Circuits from the Juice library

This module provides a Julia implementation of Probabilistic Circuits (PCs), tools to learn structure and parameters of PCs from data, and tools to do tractable exact inference with them. Probabilistic Circuits provides a unifying framework for several family of tractable probabilistic models. PCs are represented as computational graphs that define a joint probability distribution as recursive mixtures (sum units) and factorizations (product units) of simpler distributions (input units)....

Downloads: 0 This Week

Last Update: 2024-06-10
See Project
13

Tiny CUDA Neural Networks

Lightning fast C++/CUDA neural network framework

This is a small, self-contained framework for training and querying neural networks. Most notably, it contains a lightning-fast "fully fused" multi-layer perceptron (technical paper), a versatile multiresolution hash encoding (technical paper), as well as support for various other input encodings, losses, and optimizers. We provide a sample application where an image function (x,y) -> (R,G,B) is learned. The fully fused MLP component of this framework requires a very large amount of shared...

Downloads: 0 This Week

Last Update: 2025-07-08
See Project
14

Lightweight' GAN

Implementation of 'lightweight' GAN, proposed in ICLR 2021

...You can turn on automatic mixed precision with one flag --amp. You should expect it to be 33% faster and save up to 40% memory. Aim is an open-source experiment tracker that logs your training runs, and enables a beautiful UI to compare them.

Downloads: 0 This Week

Last Update: 2025-01-12
See Project
15

Fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python

Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks. We provide reference implementations of various sequence modeling papers. Recent work by Microsoft and Google has shown that data parallel training can be made significantly more efficient by sharding the model parameters and optimizer state across data parallel workers. These ideas are encapsulated in the...

Downloads: 0 This Week

Last Update: 2022-06-27
See Project
16

MACE

Deep learning inference framework optimized for mobile platforms

...Runtime is optimized with NEON, OpenCL and Hexagon, and Winograd algorithm is introduced to speed up convolution operations. The initialization is also optimized to be faster. Chip-dependent power options like big.LITTLE scheduling, Adreno GPU hints are included as advanced APIs. UI responsiveness guarantee is sometimes obligatory when running a model. Mechanism like automatically breaking OpenCL kernel into small units is introduced to allow better preemption for the UI rendering task. Graph level memory allocation optimization and buffer reuse are supported. ...

Downloads: 0 This Week

Last Update: 2022-01-13
See Project
17

Detectron2

Next-generation platform for object detection and segmentation

...Includes more features such as panoptic segmentation, Densepose, Cascade R-CNN, rotated bounding boxes, PointRend, DeepLab, etc. Can be used as a library to support different projects on top of it. We'll open source more research projects in this way. It trains much faster. Models can be exported to TorchScript format or Caffe2 format for deployment. With a new, more modular design, Detectron2 is flexible and extensible, and able to provide fast training on single or multiple GPU servers. Detectron2 includes high-quality implementations of state-of-the-art object detection.

Downloads: 2 This Week

Last Update: 2021-10-26
See Project
18

Spleeter

Deezer source separation library including pretrained models

...It makes it easy to train music source separation models (assuming you have a dataset of isolated sources), and provides already trained state of the art models for performing various flavours of separation. 2 stems and 4 stems models have state of the art performances on the musdb dataset. Spleeter is also very fast as it can perform separation of audio files to 4 stems 100x faster than real-time when run on a GPU. We designed Spleeter so you can use it straight from command line as well as directly in your own development pipeline as a Python library. It can be installed with Conda, with pip or be used with Docker.

1 Review

Downloads: 59 This Week

Last Update: 2021-09-03
See Project
19

YOLO ROS

YOLO ROS: Real-Time Object Detection for ROS

...Darknet on the CPU is fast (approximately 1.5 seconds on an Intel Core i7-6700HQ CPU @ 2.60GHz × 8) but it's like 500 times faster on GPU! You'll have to have an Nvidia GPU and you'll have to install CUDA. The CMakeLists.txt file automatically detects if you have CUDA installed or not. CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia.

Downloads: 1 This Week

Last Update: 2022-08-12
See Project
20

PyText

A natural language modeling framework based on PyTorch

...We use PyText at Facebook to iterate quickly on new modeling ideas and then seamlessly ship them at scale. Distributed-training support built on the new C10d backend in PyTorch 1.0. Mixed precision training support through APEX (trains faster with less GPU memory on NVIDIA Tensor Cores). Extensible components that allows easy creation of new models and tasks.

Downloads: 0 This Week

Last Update: 2021-08-31
See Project
21

Bender

Easily craft fast Neural Networks on iOS

Bender allows you to easily define and run neural networks on your iOS apps, it uses Apple’s MetalPerformanceShaders under the hood. Bender provides the ease of use of CoreML with the flexibility of a modern ML framework. Bender allows you to run trained models, you can use Tensorflow, Keras, Caffe, the choice is yours. Either freeze the graph or export the weights to files. You can import a frozen graph directly from supported platforms or re-define the network structure and load the...

Downloads: 0 This Week

Last Update: 2022-08-11
See Project
22

SoAx

Structure of Arrays of multiple types

Structures of arrays (SoA) are generally faster than arrays of structures (AoS) while AoS are more handy. This project (SoAx) combines the advantages of both. By means of C++(11) meta-template programming SoAx achieves maximal performance (efficient use of vector units and cache of modern CPUs) while providing a very convenient user interface (including object-oriented element handling) and flexibility. It has been designed to handle list-like sets of particles (similar to struct {int id;...

Downloads: 0 This Week

Last Update: 2017-10-19
See Project
23

GPUImage

iOS framework for GPU-based image and video processing

The GPUImage framework is a BSD-licensed iOS library that lets you apply GPU-accelerated filters and other effects to images, live camera video, and movies. In comparison to Core Image (part of iOS 5.0), GPUImage allows you to write your own custom filters, supports deployment to iOS 4.0, and has a slightly simpler interface. However, it currently lacks some of the more advanced features of Core Image, such as facial detection.

Downloads: 0 This Week

Last Update: 2023-02-09
See Project
24

MicroLua DS

MicroLua brings Lua on the Nintendo DS for easy programming

MicroLua brings the programming language Lua on the Nintendo DS for easy and fast development of beautiful homebrews! Based on brunni's µLibrary, µLua is a Lua interpreter featuring fast drawings and many important functionalities. You can exploit your Nintendo DS with the simplistic yet powerful Lua language! On your cartridge, MicroLua is a NDS executable that shows as its frontend a great graphical shell from which you can explore your cartridge and run Lua scripts written for...

1 Review

Downloads: 21 This Week

Last Update: 2016-11-29
See Project