Showing 160 open source projects for "gpu hardware"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Go From AI Idea to AI App Fast Icon
    Go From AI Idea to AI App Fast

    One platform to build, fine-tune, and deploy ML models. No MLOps team required.

    Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
    Try Free
  • 1
    Humanoid-Gym

    Humanoid-Gym

    Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

    Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. The framework emphasizes the concept of zero-shot sim-to-real transfer, meaning that behaviors learned in simulation can be deployed directly on physical robots with minimal adjustment. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 2
    Megatron-LM

    Megatron-LM

    Ongoing research training transformer models at scale

    Megatron-LM is a GPU-optimized deep learning framework from NVIDIA designed to train extremely large transformer-based language models efficiently at scale. The repository provides both a reference training implementation and Megatron Core, a composable library of high-performance building blocks for custom large-model pipelines. It supports advanced parallelism strategies including tensor, pipeline, data, expert, and context parallelism, enabling training across massive multi-GPU and multi-node clusters. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    ChatGLM.cpp

    ChatGLM.cpp

    C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

    ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 4
    OptiScaler

    OptiScaler

    OptiScaler bridges upscaling/frame gen across GPUs

    ...This makes it possible to swap technologies such as NVIDIA DLSS, AMD FSR, or Intel XeSS even if the game only supports one of them by default. The tool effectively acts as a compatibility layer between the game engine and multiple upscaling frameworks, enabling cross-GPU access to features that might otherwise be restricted to specific hardware ecosystems. In addition to replacing upscalers, OptiScaler can enable frame generation features in titles that do not officially support them, improving frame rates and perceived smoothness during gameplay.
    Downloads: 140 This Week
    Last Update:
    See Project
  • $300 in Free Credit Towards Top Cloud Services Icon
    $300 in Free Credit Towards Top Cloud Services

    Build VMs, containers, AI, databases, storage—all in one place.

    Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.
    Get Started
  • 5
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...
    Downloads: 15 This Week
    Last Update:
    See Project
  • 6
    node-llama-cpp

    node-llama-cpp

    Run AI models locally on your machine with node.js bindings for llama

    ...By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    mpv

    mpv

    Command line video player

    mpv is a free (as in freedom) media player for the command line. It supports a wide variety of media file formats, audio and video codecs, and subtitle types. Powerful scripting capabilities can make the player do almost anything. There is a large selection of user scripts on the wiki. While mpv strives for minimalism and provides no real GUI, it has a small controller on top of the video for basic control. mpv has an OpenGL, Vulkan, and D3D11 based video output that is capable of many...
    Downloads: 93 This Week
    Last Update:
    See Project
  • 8
    LuxTTS

    LuxTTS

    A high-quality rapid TTS voice cloning model

    LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports zero-shot voice cloning, meaning it can adapt to a reference speaker’s voice with minimal example data, enabling realistic and personalized synthetic speech. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 9
    NVIDIA Isaac Sim

    NVIDIA Isaac Sim

    NVIDIA Isaac Sim is an open-source application on NVIDIA Omniverse

    NVIDIA Isaac Sim is a high-fidelity robotics simulation platform built on NVIDIA Omniverse to develop, test, and validate AI-driven robots in physically accurate virtual environments. It supports a wide array of robotics formats (URDF, MJCF, CAD), includes GPU-accelerated physics, and features immersive RTX rendering and multisensory simulation. Realistic physics via GPU-accelerated engines and RTX ray tracing. Multi-sensor simulation (RGB-D cameras, Lidar, Radar, IMU, contact sensors)....
    Downloads: 1 This Week
    Last Update:
    See Project
  • Enterprise-grade ITSM, for every business Icon
    Enterprise-grade ITSM, for every business

    Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity.

    Freshservice is an intuitive, AI-powered platform that helps IT, operations, and business teams deliver exceptional service without the usual complexity. Automate repetitive tasks, resolve issues faster, and provide seamless support across the organization. From managing incidents and assets to driving smarter decisions, Freshservice makes it easy to stay efficient and scale with confidence.
    Try it Free
  • 10
    Covalent workflow

    Covalent workflow

    Pythonic tool for running machine-learning/high performance workflows

    Covalent is a Pythonic workflow tool for computational scientists, AI/ML software engineers, and anyone who needs to run experiments on limited or expensive computing resources including quantum computers, HPC clusters, GPU arrays, and cloud services. Covalent enables a researcher to run computation tasks on an advanced hardware platform – such as a quantum computer or serverless HPC cluster – using a single line of code. Covalent overcomes computational and operational challenges inherent in AI/ML experimentation.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    PEFT

    PEFT

    State-of-the-art Parameter-Efficient Fine-Tuning

    Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    Colossal-AI

    Colossal-AI

    Making large AI models cheaper, faster and more accessible

    The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    stt

    stt

    Voice Recognition to Text Tool

    ...The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    Bend

    Bend

    A massively parallel, high-level programming language

    Bend is an interactive programming environment (REPL) built on top of the Kotlin language, designed to allow users to explore, experiment, and learn Kotlin in a live, feedback-driven manner. The tool lets you define variables, functions, or values at the prompt and iteratively refine them—immediately seeing output and types—while preserving state across commands. It emphasizes discoverability and experimentation: users can inspect functions, call them on sample inputs, and evolve logic...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    ChatGLM-6B

    ChatGLM-6B

    ChatGLM-6B: An Open Bilingual Dialogue Language Model

    ...The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 16
    Apollo

    Apollo

    The easiest way to stream with the native resolution of your client

    Apollo is a self-hosted desktop streaming host designed to enable low-latency game streaming from a personal computer to remote clients using protocols compatible with Moonlight and Artemis. It acts as a server that captures, encodes, and streams desktop or game sessions while supporting hardware acceleration across AMD, Intel, and NVIDIA GPUs. The project includes a web-based interface that allows users to configure streaming settings, manage connected clients, and control application...
    Downloads: 19 This Week
    Last Update:
    See Project
  • 17
    uzu

    uzu

    A high-performance inference engine for AI models

    uzu is a high-performance inference engine designed to run artificial intelligence models efficiently on Apple Silicon hardware. Written primarily in Rust and leveraging Apple’s Metal framework, the project focuses on maximizing performance when executing large language models and other AI workloads on devices such as Mac computers with M-series chips. The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    UCCL

    UCCL

    UCCL is an efficient communication library for GPUs

    UCCL is a high-performance GPU communication library designed to support distributed machine learning workloads and large-scale AI systems. The library focuses on enabling efficient data transfer and collective communication between GPUs during training and inference processes. It supports a variety of communication patterns including collective operations such as all-reduce as well as peer-to-peer transfers that are commonly used in modern machine learning architectures.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    CUDA.jl

    CUDA.jl

    CUDA programming in Julia

    High-performance GPU programming in a high-level language. JuliaGPU is a GitHub organization created to unify the many packages for programming GPUs in Julia. With its high-level syntax and flexible compiler, Julia is well-positioned to productively program hardware accelerators like GPUs without sacrificing performance. The latest development version of CUDA.jl requires Julia 1.8 or higher.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Model Zoo

    Model Zoo

    Please do not feed the models

    ...Each model is organized into its own project folder with pinned package versions, ensuring reproducibility and stability. The examples serve both as educational tools for learning Flux and as practical starting points for building new models. GPU acceleration is supported for most models through CUDA integration, enabling efficient training on compatible hardware. With community contributions encouraged, the Model Zoo acts as a hub for sharing and exploring diverse machine learning applications in Julia.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    HLSL++

    HLSL++

    Math library using HLSL syntax with multiplatform SIMD support

    HLSL++ is a header-only C++ math library designed to replicate the syntax and functionality of the HLSL shading language, making it easier for developers to write CPU-side code that mirrors GPU shader logic. It provides vector, matrix, and math operations with a syntax identical or very similar to HLSL, allowing seamless transition between shader code and application code. The library is optimized for performance and supports SIMD instructions across multiple architectures, including SSE, AVX, AVX2, AVX512, and ARM NEON, ensuring high efficiency on modern hardware.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 22
    Transcoder

    Transcoder

    Hardware-accelerated video transcoding using Android MediaCodec APIs

    Transcoder by DeepMedia is an AI-powered video-to-video speech translation engine that enables fully automated multilingual dubbing. Unlike traditional speech translation systems that rely on multi-stage pipelines, Transcoder directly translates one speaker’s video into another language while preserving facial expressions, lip-sync, and vocal identity. Designed for real-time use and production-grade pipelines, Transcoder combines advanced deep learning models with GPU acceleration to deliver...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    ort

    ort

    Fast ML inference & training for ONNX models in Rust

    ort is a high-performance Rust library that provides bindings to ONNX Runtime, enabling developers to run machine learning inference and training workflows directly within Rust applications using the standardized ONNX model format. It is designed to bridge the gap between modern machine learning frameworks and systems programming by offering a safe, ergonomic API for executing models originally built in ecosystems like PyTorch, TensorFlow, or scikit-learn. The library emphasizes speed and...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    tvm

    tvm

    Open deep learning compiler stack for cpu, gpu, etc.

    Apache TVM is an open source machine learning compiler framework for CPUs, GPUs, and machine learning accelerators. It aims to enable machine learning engineers to optimize and run computations efficiently on any hardware backend. The vision of the Apache TVM Project is to host a diverse community of experts and practitioners in machine learning, compilers, and systems architecture to build an accessible, extensible, and automated open-source framework that optimizes current and emerging...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    QuickViewer

    QuickViewer

    A image/comic viewer application for Windows, Mac and Linux

    QuickViewer is a fast and lightweight image viewer designed to handle large image collections and archive formats with blazing performance. Built using C++ and Qt, QuickViewer is optimized for viewing manga, comics, and large photo folders with instant loading and minimal lag. It includes a streamlined interface, hardware-accelerated rendering, and features tailored for browsing through image series efficiently. With support for compressed formats like ZIP and RAR, QuickViewer is especially...
    Downloads: 19 This Week
    Last Update:
    See Project