Showing 115 open source projects for "gpu speed"

View related business solutions
  • Fully Managed MySQL, PostgreSQL, and SQL Server Icon
    Fully Managed MySQL, PostgreSQL, and SQL Server

    Automatic backups, patching, replication, and failover. Focus on your app, not your database.

    Cloud SQL handles your database ops end to end, so you can focus on your app.
    Try Free
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 1
    GPU Hot

    GPU Hot

    Real-time NVIDIA GPU dashboard

    ...The dashboard collects and displays a wide range of performance metrics including temperature, memory usage, power consumption, clock speeds, fan speed, and active processes. It can scale from monitoring a single GPU workstation to large distributed environments with dozens or even hundreds of GPUs by running lightweight containers on each node and aggregating the data centrally.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 2
    TrafficMonitor

    TrafficMonitor

    Floating window used to display current network speed, CPU & memory

    TrafficMonitor is a network monitoring software with floating window feature for Windows. It displays the current internet speed and CPU and RAM usage. There are also other capabilities like an embedded display in the taskbar, changeable display skins, and historical traffic statistics. There are two versions of TrafficMonitor, the standard version and the Lite version. The standard version includes all the functions, while the Lite version does not include hardware monitoring functions such as temperature monitoring, GPU usage, and hard disk usage. ...
    Downloads: 193 This Week
    Last Update:
    See Project
  • 3
    GPUArrays

    GPUArrays

    Reusable array functionality for Julia's various GPU backends

    Reusable GPU array functionality for Julia's various GPU backends. This package is the counterpart of Julia's AbstractArray interface, but for GPU array types: It provides functionality and tooling to speed-up development of new GPU array types. This package is not intended for end users! Instead, you should use one of the packages that builds on GPUArrays.jl, such as CUDA.jl, oneAPI.jl, AMDGPU.jl, or Metal.jl.
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    GPU-Z

    GPU-Z

    Lightweight GPU information and diagnostics tool.

    ...It accurately reports clock speeds, including default, overclocked, 3D, and boost clocks. Furthermore, it provides a detailed analysis of the memory subsystem, including size, type, speed, and bus width. Unique features include a GPU load test to verify PCI-Express configuration, results validation, and the ability to back up your graphics card BIOS. It is portable (requires no installation) and fully supports all modern Windows versions, including Windows 11. (GPU-Z, graphics card info, GPU specs, video card diagnostics, NVIDIA, AMD, Intel, BIOS backup, overclocking, sensor monitoring, free download, portable, TechPowerUp.)
    Downloads: 233 This Week
    Last Update:
    See Project
  • Custom VMs From 1 to 96 vCPUs With 99.95% Uptime Icon
    Custom VMs From 1 to 96 vCPUs With 99.95% Uptime

    General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

    Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.
    Try Free
  • 5
    Fan Control

    Fan Control

    Highly customizable fan controlling software for Windows

    Fan Control is a Windows utility designed to give users fine-grained, customizable control over system fans (CPU, GPU, case, etc.) based on temperature and sensor inputs. Rather than relying solely on BIOS fan curves, it allows dynamic adjustment of fan behaviour at the operating-system level — letting you react to real-time load, mix multiple sensors (CPU, GPU, motherboard, drives, etc.), and define custom fan-speed curves for different situations.
    Downloads: 213 This Week
    Last Update:
    See Project
  • 6
    PowerInfer

    PowerInfer

    High-speed Large Language Model Serving for Local Deployment

    PowerInfer is a high-performance inference engine designed to run large language models efficiently on personal computers equipped with consumer-grade GPUs. The project focuses on improving the performance of local AI inference by optimizing how neural network computations are distributed between CPU and GPU resources. Its architecture exploits the observation that only a subset of neurons in large models are frequently activated, allowing the system to preload frequently used neurons into GPU memory while processing less common activations on the CPU. This hybrid execution strategy significantly reduces memory bottlenecks and improves overall inference speed.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 7
    llmfit

    llmfit

    157 models, 30 providers, one command to find what runs on hardware

    llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...
    Downloads: 31 This Week
    Last Update:
    See Project
  • 8
    CatBoost

    CatBoost

    High-performance library for gradient boosting on decision trees

    ...It is a machine learning method with plenty of applications, including ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. CatBoost offers superior performance over other GBDT libraries on many datasets, and has several superb features. It has best in class prediction speed, supports both numerical and categorical features, has a fast and scalable GPU version, and readily comes with visualization tools. CatBoost was developed by Yandex and is used in various areas including search, self-driving cars, personal assistance, weather prediction and more.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 9
    Shumai

    Shumai

    Fast Differentiable Tensor Library in JavaScript & TypeScript with Bun

    ...It can automatically leverage GPU acceleration on Linux (via CUDA) and CPU computation on macOS.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 10
    HunyuanVideo

    HunyuanVideo

    HunyuanVideo: A Systematic Framework For Large Video Generation Model

    ...The framework aims to push the boundaries of video generation quality, incorporating multiple innovative approaches to improve the realism and coherence of the generated content. Release of FP8 model weights to reduce GPU memory usage / improve efficiency. Parallel inference code to speed up sampling, utilities and tests included.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    how-to-optim-algorithm-in-cuda

    how-to-optim-algorithm-in-cuda

    How to optimize some algorithm in cuda

    ...The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    CuPy

    CuPy

    A NumPy-compatible array library accelerated by CUDA

    CuPy is an open source implementation of NumPy-compatible multi-dimensional array accelerated with NVIDIA CUDA. It consists of cupy.ndarray, a core multi-dimensional array class and many functions on it. CuPy offers GPU accelerated computing with Python, using CUDA-related libraries to fully utilize the GPU architecture. According to benchmarks, it can even speed up some operations by more than 100X. CuPy is highly compatible with NumPy, serving as a drop-in replacement in most cases. CuPy is very easy to install through pip or through precompiled binary packages called wheels for recommended environments. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 13
    Nvitop

    Nvitop

    An interactive NVIDIA-GPU process viewer and beyond

    nvitop is an interactive NVIDIA device and process monitoring tool. It has a colorful and informative interface that continuously updates the status of the devices and processes. As a resource monitor, it includes many features and options, such as tree-view, environment variable viewing, process filtering, process metrics monitoring, etc. Beyond that, the package also ships a CUDA device selection tool nvisel for deep learning researchers. It also provides handy APIs that allow developers...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14

    LightGBM

    Gradient boosting framework based on decision tree algorithms

    LightGBM or Light Gradient Boosting Machine is a high-performance, open source gradient boosting framework based on decision tree algorithms. Compared to other boosting frameworks, LightGBM offers several advantages in terms of speed, efficiency and accuracy. Parallel experiments have shown that LightGBM can attain linear speed-up through multiple machines for training in specific settings, all while consuming less memory. LightGBM supports parallel and GPU learning, and can handle large-scale data. It’s become widely-used for ranking, classification and many other machine learning tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    NumPy

    NumPy

    The fundamental package for scientific computing with Python

    ...NumPy offers comprehensive mathematical functions, random number generators, linear algebra routines, Fourier transforms, and more. NumPy supports a wide range of hardware and computing platforms, and plays well with distributed, GPU, and sparse array libraries. The core of NumPy is well-optimized C code. Enjoy the flexibility of Python with the speed of compiled code. NumPy’s high level syntax makes it accessible and productive for programmers from any background or experience level. Distributed under a liberal BSD license, NumPy is developed and maintained publicly on GitHub by a vibrant, responsive, and diverse community. ...
    Downloads: 98 This Week
    Last Update:
    See Project
  • 16
    Stats

    Stats

    macOS system monitor in your menu bar

    Stats currently supported on macOS 10.13 (High Sierra) and higher. Stats is an application that allows you to monitor your macOS system. CPU utilization, GPU utilization, memory usage, disk utilization, sensors information (Temperature/Voltage/Power), battery level, network usage, fans speed, fan control, and Bluetooth devices. Supports many languages, such as English, Polski, Українська, Русский, and many more. You can help by adding a new language or improve existing translation.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    PyTorch

    PyTorch

    Open source machine learning framework

    ...PyTorch can be used as a replacement for Numpy, or as a deep learning research platform that provides optimum flexibility and speed.
    Downloads: 124 This Week
    Last Update:
    See Project
  • 18
    Faster Whisper

    Faster Whisper

    Faster Whisper transcription with CTranslate2

    Faster Whisper is an optimized implementation of the Whisper speech recognition model designed to deliver significantly faster inference while maintaining comparable accuracy. It leverages efficient inference engines and optimized computation strategies to reduce latency and resource consumption. The system is particularly useful for real-time or large-scale transcription tasks where performance is critical. It supports multiple model sizes, allowing users to balance speed and accuracy based...
    Downloads: 16 This Week
    Last Update:
    See Project
  • 19
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 20
    DeSmuME

    DeSmuME

    DeSmuME is a Nintendo DS emulator

    ...Also, DeSmuME focuses more on compatibility and features than on speed. Our philosophy is this: You can always mow some extra lawns or babysit some more rugrats to buy upgrades for your computer; but there's nothing you can do to fix compatibility or gain new features. We take care of our side of things, so you should take care of yours. DeSmuME is mostly CPU intensive and less GPU intensive.
    Downloads: 33 This Week
    Last Update:
    See Project
  • 21
    Habitat-Sim

    Habitat-Sim

    A flexible, high-performance 3D simulator for Embodied AI research

    ...It ships with connectors to popular datasets and scene formats, plus tools for dataset generation and scene replay. Determinism and reproducibility are first-class goals, which is critical for benchmarking agents and comparing algorithms. Thanks to its speed and modular design, Habitat-Sim is widely used to prototype embodied agents, train at scale, and evaluate in standardized environments with consistent metrics.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    cuDF

    cuDF

    GPU DataFrame Library

    ...The RAPIDS suite of open-source software libraries aims to enable the execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 23
    Video-subtitle-extractor

    Video-subtitle-extractor

    A GUI tool for extracting hard-coded subtitle (hardsub) from videos

    ...Use local OCR recognition, no need to set up and call any API, and do not need to access online OCR services such as Baidu and Ali to complete text recognition locally. Support GPU acceleration, after GPU acceleration, you can get higher accuracy and faster extraction speed. (CLI version) No need for users to manually set the subtitle area, the project automatically detects the subtitle area through the text detection model. Filter the text in the non-subtitle area and remove the watermark (station logo) text.
    Downloads: 70 This Week
    Last Update:
    See Project
  • 24
    LightLLM

    LightLLM

    LightLLM is a Python-based LLM (Large Language Model) inference

    LightLLM is a high-performance inference and serving framework designed specifically for large language models, focusing on lightweight architecture, scalability, and efficient deployment. The framework enables developers to run and serve modern language models with significantly improved speed and resource efficiency compared to many traditional inference systems. Built primarily in Python, the project integrates optimization techniques and ideas from several leading open-source implementations, including FasterTransformer, vLLM, and FlashAttention, to accelerate token generation and reduce latency. LightLLM is designed to handle large-scale model workloads in production environments, supporting efficient batching and GPU utilization for fast inference across multiple requests. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 25
    Flash-MoE

    Flash-MoE

    Running a big model on a small laptop

    ...It focuses on accelerating routing and computation by leveraging optimized kernels and memory management techniques, allowing models to dynamically select specialized sub-networks during inference. The project aims to reduce the computational cost typically associated with MoE systems while maintaining or improving performance. It likely includes support for GPU acceleration and parallel processing, enabling it to handle large-scale workloads effectively. The architecture emphasizes speed and efficiency, making it suitable for both research and production environments where performance is critical. It may also provide tools for benchmarking and tuning model behavior. Overall, flash-moe represents a technical advancement in making MoE models more practical and deployable.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
MongoDB Logo MongoDB