gpu faster free download

Showing 64 open source projects for "gpu faster"

View related business solutions

Mac Clear Filters & Widen Search

AI-generated apps that pass security review
Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.

Try Retool free
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
1

Alacritty

A cross-platform, GPU-accelerated terminal emulator

...With such a strong focus on simplicity and performance, Alacritty’s included features are very carefully considered, ensuring that it remains blazingly fast. It’s got a GPU for rendering that makes a whole lot of optimizations possible. In various benchmarked terminals, Alacritty has shown to be either faster, or way faster than others. Alacritty requires no additional setup, but still allows configuration of many aspects of the terminal. It supports Windows, macOS, Linux and BSD.

Downloads: 12 This Week

Last Update: 2025-10-20
See Project
2

FlashAttention

Fast and memory-efficient exact attention

FlashAttention is a high-performance deep learning optimization library that reimplements the attention mechanism used in transformer models to be significantly faster and more memory-efficient than standard implementations. It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. ...

Downloads: 8 This Week

Last Update: 2026-03-18
See Project
3

stt

Voice Recognition to Text Tool

...It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.

Downloads: 1 This Week

Last Update: 2026-02-17
See Project
4

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

HunyuanVideo is a cutting-edge framework designed for large-scale video generation, leveraging advanced AI techniques to synthesize videos from various inputs. It is implemented in PyTorch, providing pre-trained model weights and inference code for efficient deployment. The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU...

1 Review

Downloads: 6 This Week

Last Update: 2025-09-23
See Project
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
5

TensorRT Node for ComfyUI

Enables the best performance on NVIDIA RTX Graphics Cards

...This is particularly attractive for power users who run many generations or who host ComfyUI on dedicated hardware and want to squeeze out every bit of GPU performance. In short, it’s about taking ComfyUI from “it runs” to “it runs fast” on NVIDIA GPUs.

Downloads: 0 This Week

Last Update: 2025-10-30
See Project
6

cuML

RAPIDS Machine Learning Library

cuML is a suite of libraries that implement machine learning algorithms and mathematical primitives functions that share compatible APIs with other RAPIDS projects. cuML enables data scientists, researchers, and software engineers to run traditional tabular ML tasks on GPUs without going into the details of CUDA programming. In most cases, cuML's Python API matches the API from scikit-learn. For large datasets, these GPU-based implementations can complete 10-50x faster than their CPU equivalents. For details on performance, see the cuML Benchmarks Notebook.

Downloads: 0 This Week

Last Update: 2026-02-05
See Project
7

DeepEP

DeepEP: an efficient expert-parallel communication library

DeepEP is a communication library designed specifically to support Mixture-of-Experts (MoE) and expert parallelism (EP) deployments. Its core role is to implement high-throughput, low-latency all-to-all GPU communication kernels, which handle the dispatching of tokens to different experts (or shards) and then combining expert outputs back into the main data flow. Because MoE architectures require routing inputs to different experts, communication overhead can become a bottleneck — DeepEP...

Downloads: 2 This Week

Last Update: 2025-10-03
See Project
8

Meridian

Meridian is an MMM framework

Meridian is a comprehensive, open source marketing mix modeling (MMM) framework developed by Google to help advertisers analyze and optimize the impact of their marketing investments. Built on Bayesian causal inference principles, Meridian enables organizations to evaluate how different marketing channels influence key performance indicators (KPIs) such as revenue or conversions while accounting for external factors like seasonality or economic trends. The framework provides a robust...

Downloads: 8 This Week

Last Update: 9 hours ago
See Project
9

LuxTTS

A high-quality rapid TTS voice cloning model

...Its design emphasizes efficiency and practicality, fitting within modest GPU memory footprints.

Downloads: 1 This Week

Last Update: 2026-03-12
See Project
Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
10

Scalene

High-performance CPU, GPU, and memory profiler for Python

Scalene is a high-performance CPU, GPU and memory profiler for Python that does a number of things that other Python profilers do not and cannot do. It runs orders of magnitude faster than other profilers while delivering far more detailed information. Once Scalene has profiled your program, it will launch a web browser with an interactive user interface (all processing is done locally).

Downloads: 0 This Week

Last Update: 2026-03-22
See Project
11

LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

LLaMA-Factory is a fine-tuning and training framework for Meta's LLaMA language models. It enables researchers and developers to train and customize LLaMA models efficiently using advanced optimization techniques.

Downloads: 4 This Week

Last Update: 2025-12-31
See Project
12

pomegranate

Fast, flexible and easy to use probabilistic modelling in Python

pomegranate is a library for probabilistic modeling defined by its modular implementation and treatment of all models as the probability distributions they are. The modular implementation allows one to easily drop normal distributions into a mixture model to create a Gaussian mixture model just as easily as dropping a gamma and a Poisson distribution into a mixture model to create a heterogeneous mixture. But that's not all! Because each model is treated as a probability distribution,...

Downloads: 0 This Week

Last Update: 2024-08-06
See Project
13

Depth Pro

Sharp Monocular Metric Depth in Less Than a Second

Depth Pro is a foundation model for zero-shot metric monocular depth estimation, producing sharp, high-frequency depth maps with absolute scale from a single image. Unlike many prior approaches, it does not require camera intrinsics or extra metadata, yet still outputs metric depth suitable for downstream 3D tasks. Apple highlights both accuracy and speed: the model can synthesize a ~2.25-megapixel depth map in around 0.3 seconds on a standard GPU, enabling near real-time applications. The...

Downloads: 5 This Week

Last Update: 2025-10-08
See Project
14

WhisperLive

A nearly-live implementation of OpenAI's Whisper

WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...

Downloads: 6 This Week

Last Update: 2026-03-17
See Project
15

CTranslate2

Fast inference engine for Transformer models

...The project implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU. The execution is significantly faster and requires less resources than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. The model serialization and computation support weights with reduced precision: 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). ...

Downloads: 1 This Week

Last Update: 2026-02-04
See Project
16

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment....

Downloads: 1 This Week

Last Update: 2025-05-28
See Project
17

CodeGeeX2

CodeGeeX2: A More Powerful Multilingual Code Generation Model

...The model excels at code generation, translation, summarization, debugging, and comment generation, and it supports over 100 programming languages. With improved inference efficiency, quantization options, and multi-query/flash attention, CodeGeeX2 achieves faster generation speeds and lightweight deployment, requiring as little as 6GB GPU memory at INT4 precision. Its backend powers the CodeGeeX IDE plugins for VS Code, JetBrains, and other editors, offering developers interactive AI assistance with features like infilling and cross-file completion.

Downloads: 6 This Week

Last Update: 6 days ago
See Project
18

XFrames

GPU-accelerated GUI development for Node.js and the browser

xframes is a high-performance library that empowers developers to build native desktop applications using familiar web technologies, specifically Node.js and React, without the overhead of the DOM. xframes serves as a streamlined alternative to Electron, designed for developers looking to maximize performance and efficiency.

Downloads: 0 This Week

Last Update: 2024-12-07
See Project
19

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM

ChatGLM2-6B is the second-gen Chinese-English conversational LLM from ZhipuAI/Tsinghua. It upgrades the base model with GLM’s hybrid pretraining objective, 1.4 TB bilingual data, and preference alignment—delivering big gains on MMLU, CEval, GSM8K, and BBH. The context window extends up to 32K (FlashAttention), and Multi-Query Attention improves speed and memory use. The repo includes Python APIs, CLI & web demos, OpenAI-style/FASTAPI servers, and quantized checkpoints for lightweight local...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
20

VCClient

Software that uses AI to perform real-time voice conversion

...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.

Downloads: 14 This Week

Last Update: 2026-03-23
See Project
21

PostgresML

The GPU-powered AI application database

PostgresML is a complete platform in a PostgreSQL extension. Build simpler, faster, and more scalable models right inside your database. Explore the SDK and test open source models in our hosted database. Combine and automate the entire workflow from embedding generation to indexing and querying for the simplest (and fastest) knowledge-based chatbot implementation. Leverage multiple types of natural language processing and machine learning models such as vector search and personalization...

Downloads: 0 This Week

Last Update: 2025-01-16
See Project
22

X-AnyLabeling

Effortless data labeling with AI support from Segment Anything

X-AnyLabeling is an open-source data annotation platform designed to streamline the process of labeling datasets for computer vision and multimodal AI applications. The software integrates an AI-powered labeling engine that allows users to generate annotations automatically with the assistance of modern vision models such as Segment Anything and various object detection frameworks. It supports labeling tasks across images and videos and enables developers to prepare training datasets for...

Downloads: 5 This Week

Last Update: 2026-03-26
See Project
23

Unsloth Studio

Unified web UI for training and running open models locally

Unsloth Studio is a web-based interface for running and training AI models locally with a unified and user-friendly experience. It allows users to work with a wide range of models for text, audio, vision, embeddings, and more without relying heavily on cloud infrastructure. Built on top of the Unsloth framework, it focuses on high-performance training with reduced VRAM usage and faster speeds compared to traditional methods. The platform supports fine-tuning, pretraining, and reinforcement...

Downloads: 9 This Week

Last Update: 4 days ago
See Project
24

ncnn

High-performance neural network inference framework for mobile

ncnn is a high-performance neural network inference computing framework designed specifically for mobile platforms. It brings artificial intelligence right at your fingertips with no third-party dependencies, and speeds faster than all other known open source frameworks for mobile phone cpu. ncnn allows developers to easily deploy deep learning algorithm models to the mobile platform and create intelligent APPs. It is cross-platform and supports most commonly used CNN networks, including...

Downloads: 24 This Week

Last Update: 2026-01-13
See Project
25

AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero

Beyond its much publicized success in attaining superhuman level at games such as Chess and Go, DeepMind's AlphaZero algorithm illustrates a more general methodology of combining learning and search to explore large combinatorial spaces effectively. We believe that this methodology can have exciting applications in many different research areas. Because AlphaZero is resource-hungry, successful open-source implementations (such as Leela Zero) are written in low-level languages (such as C++)...

Downloads: 12 This Week

Last Update: 2025-12-12
See Project