gpu speed free download

Showing 115 open source projects for "gpu speed"

View related business solutions

Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
Enterprise-grade ITSM, for every business
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.

Try it Free
1

GPU Hot

Real-time NVIDIA GPU dashboard

...The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.

Downloads: 3 This Week

Last Update: 2026-04-11
See Project
2

TrafficMonitor

Floating window used to display current network speed, CPU & memory

TrafficMonitor is a network monitoring software with floating window feature for Windows. It displays the current internet speed and CPU and RAM usage. There are also other capabilities like an embedded display in the taskbar, changeable display skins, and historical traffic statistics. There are two versions of TrafficMonitor, the standard version and the Lite version. The standard version includes all the functions, while the Lite version does not include hardware monitoring functions such as temperature monitoring, GPU usage, and hard disk usage. ...

1 Review

Downloads: 193 This Week

Last Update: 2026-03-29
See Project
3

GPUArrays

Reusable array functionality for Julia's various GPU backends

Reusable GPU array functionality for Julia's various GPU backends. This package is the counterpart of Julia's AbstractArray interface, but for GPU array types: It provides functionality and tooling to speed-up development of new GPU array types. This package is not intended for end users! Instead, you should use one of the packages that builds on GPUArrays.jl, such as CUDA.jl, oneAPI.jl, AMDGPU.jl, or Metal.jl.

Downloads: 8 This Week

Last Update: 3 days ago
See Project
4

GPU-Z

Lightweight GPU information and diagnostics tool.

...It accurately reports clock speeds, including default, overclocked, 3D, and boost clocks. Furthermore, it provides a detailed analysis of the memory subsystem, including size, type, speed, and bus width. Unique features include a GPU load test to verify PCI-Express configuration, results validation, and the ability to back up your graphics card BIOS. It is portable (requires no installation) and fully supports all modern Windows versions, including Windows 11. (GPU-Z, graphics card info, GPU specs, video card diagnostics, NVIDIA, AMD, Intel, BIOS backup, overclocking, sensor monitoring, free download, portable, TechPowerUp.)

1 Review

Downloads: 233 This Week

Last Update: 2025-10-10
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
5

Fan Control

Highly customizable fan controlling software for Windows

Fan Control is a Windows utility designed to give users fine-grained, customizable control over system fans (CPU, GPU, case, etc.) based on temperature and sensor inputs. Rather than relying solely on BIOS fan curves, it allows dynamic adjustment of fan behaviour at the operating-system level — letting you react to real-time load, mix multiple sensors (CPU, GPU, motherboard, drives, etc.), and define custom fan-speed curves for different situations.

Downloads: 213 This Week

Last Update: 3 days ago
See Project
6

PowerInfer

High-speed Large Language Model Serving for Local Deployment

PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
7

llmfit

157 models, 30 providers, one command to find what runs on hardware

llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...

Downloads: 31 This Week

Last Update: 2 days ago
See Project
8

CatBoost

High-performance library for gradient boosting on decision trees

...It is a machine learning method with plenty of applications, including ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. CatBoost offers superior performance over other GBDT libraries on many datasets, and has several superb features. It has best in class prediction speed, supports both numerical and categorical features, has a fast and scalable GPU version, and readily comes with visualization tools. CatBoost was developed by Yandex and is used in various areas including search, self-driving cars, personal assistance, weather prediction and more.

Downloads: 2 This Week

Last Update: 2026-02-21
See Project
9

Shumai

Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.

Downloads: 1 This Week

Last Update: 3 hours ago
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

HunyuanVideo

HunyuanVideo: A Systematic Framework For Large Video Generation Model

...The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.

1 Review

Downloads: 2 This Week

Last Update: 2025-09-23
See Project
11

how-to-optim-algorithm-in-cuda

How to optimize some algorithm in cuda

...The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.

Downloads: 2 This Week

Last Update: 3 days ago
See Project
12

CuPy

A NumPy-compatible array library accelerated by CUDA

CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. CuPy is very easy to install through pip or through precompiled binary packages called wheels for recommended environments. ...

Downloads: 4 This Week

Last Update: 2026-02-20
See Project
13

Nvitop

An interactive NVIDIA-GPU process viewer and beyond

nvitop is an interactive NVIDIA device and process monitoring tool. It has a colorful and informative interface that continuously updates the status of the devices and processes. As a resource monitor, it includes many features and options, such as tree-view, environment variable viewing, process filtering, process metrics monitoring, etc. Beyond that, the package also ships a CUDA device selection tool nvisel for deep learning researchers. It also provides handy APIs that allow developers...

Downloads: 2 This Week

Last Update: 2026-01-27
See Project
14

LightGBM

Gradient boosting framework based on decision tree algorithms

LightGBM or Light Gradient Boosting Machine is a high-performance, open source gradient boosting framework based on decision tree algorithms. Compared to other boosting frameworks, LightGBM offers several advantages in terms of speed, efficiency and accuracy. Parallel experiments have shown that LightGBM can attain linear speed-up through multiple machines for training in specific settings, all while consuming less memory. LightGBM supports parallel and GPU learning, and can handle large-scale data. It’s become widely-used for ranking, classification and many other machine learning tasks.

Downloads: 2 This Week

Last Update: 2025-02-15
See Project
15

NumPy

The fundamental package for scientific computing with Python

...NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries. The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code. NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level. Distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub by a vibrant, responsive, and diverse community. ...

Downloads: 98 This Week

Last Update: 2026-03-29
See Project
16

Stats

macOS system monitor in your menu bar

Stats currently supported on macOS 10.13 (High Sierra) and higher. Stats is an application that allows you to monitor your macOS system. CPU utilization, GPU utilization, memory usage, disk utilization, sensors information (Temperature/Voltage/Power), battery level, network usage, fans speed, fan control, and Bluetooth devices. Supports many languages, such as English, Polski, Українська, Русский, and many more. You can help by adding a new language or improve existing translation.

Downloads: 5 This Week

Last Update: 6 days ago
See Project
17

PyTorch

Open source machine learning framework

...PyTorch can be used as a replacement for Numpy, or as a deep learning research platform that provides optimum flexibility and speed.

Downloads: 124 This Week

Last Update: 2026-03-24
See Project
18

Faster Whisper

Faster Whisper transcription with CTranslate2

Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...

Downloads: 16 This Week

Last Update: 2026-04-06
See Project
19

LuxTTS

A high-quality rapid TTS voice cloning model

LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports...

Downloads: 4 This Week

Last Update: 2026-03-12
See Project
20

DeSmuME

DeSmuME is a Nintendo DS emulator

...Also, DeSmuME focuses more on compatibility and features than on speed. Our philosophy is this: You can always mow some extra lawns or babysit some more rugrats to buy upgrades for your computer; but there's nothing you can do to fix compatibility or gain new features. We take care of our side of things, so you should take care of yours. DeSmuME is mostly CPU intensive and less GPU intensive.

Downloads: 33 This Week

Last Update: 2024-08-23
See Project
21

Habitat-Sim

A flexible, high-performance 3D simulator for Embodied AI research

...It ships with connectors to popular datasets and scene formats, plus tools for dataset generation and scene replay. Determinism and reproducibility are first-class goals, which is critical for benchmarking agents and comparing algorithms. Thanks to its speed and modular design, Habitat-Sim is widely used to prototype embodied agents, train at scale, and evaluate in standardized environments with consistent metrics.

Downloads: 1 This Week

Last Update: 2025-10-07
See Project
22

cuDF

GPU DataFrame Library

...The RAPIDS suite of open-source software libraries aims to enable the execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Downloads: 0 This Week

Last Update: 2026-04-08
See Project
23

Video-subtitle-extractor

A GUI tool for extracting hard-coded subtitle (hardsub) from videos

...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model. Filter the text in the non-subtitle area and remove the watermark (station logo) text.

1 Review

Downloads: 70 This Week

Last Update: 2026-04-05
See Project
24

LightLLM

LightLLM is a Python-based LLM (Large Language Model) inference

LightLLM is a high-performance inference and serving framework designed specifically for large language models, focusing on lightweight architecture, scalability, and efficient deployment. The framework enables developers to run and serve modern language models with significantly improved speed and resource efficiency compared to many traditional inference systems. Built primarily in Python, the project integrates optimization techniques and ideas from several leading open-source implementations, including FasterTransformer, vLLM, and FlashAttention, to accelerate token generation and reduce latency. LightLLM is designed to handle large-scale model workloads in production environments, supporting efficient batching and GPU utilization for fast inference across multiple requests. ...

Downloads: 0 This Week

Last Update: 2026-03-05
See Project
25

Flash-MoE

Running a big model on a small laptop

...It focuses on accelerating routing and computation by leveraging optimized kernels and memory management techniques, allowing models to dynamically select specialized sub-networks during inference. The project aims to reduce the computational cost typically associated with MoE systems while maintaining or improving performance. It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.

Downloads: 1 This Week

Last Update: 2026-04-02
See Project