Search Results for "gpu hardware" - Page 4

Sort By:

Showing 182 open source projects for "gpu hardware"

View related business solutions

Linux Clear Filters & Widen Search

Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
1

TensorRT LLM

TensorRT LLM provides users with an easy-to-use Python API

TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as...

Downloads: 4 This Week

Last Update: 2026-03-20
See Project
2

Lemonade

Lemonade helps users run local LLMs with the highest performance

Lemonade is a local LLM runtime that aims to deliver the highest possible performance on your own hardware by auto-configuring state-of-the-art inference engines for both NPUs and GPUs. The project positions itself as a “local LLM server” you can run on laptops and workstations, abstracting away backend differences while giving you a single place to serve and manage models. Its README emphasizes real-world adoption across startups, research groups, and large companies, signaling a focus on...

Downloads: 8 This Week

Last Update: 2026-04-08
See Project
3

wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

wllama is a WebAssembly-based library that enables large language model inference directly inside a web browser. Built as a binding for the llama.cpp inference engine, the project allows developers to run LLM models locally without requiring a server backend or dedicated GPU hardware. The library leverages WebAssembly SIMD capabilities to achieve efficient execution within modern browsers while maintaining compatibility across platforms. By running models locally on the user’s device, wllama enables privacy-preserving AI applications that do not require sending data to remote servers. The framework provides both high-level APIs for common tasks such as text generation and embeddings, as well as low-level APIs that expose tokenization, sampling controls, and model state management.

Downloads: 2 This Week

Last Update: 2026-03-10
See Project
4

WebLLM

Bringing large-language models and chat to web browsers

WebLLM is a modular, customizable javascript package that directly brings language model chats directly onto web browsers with hardware acceleration. Everything runs inside the browser with no server support and is accelerated with WebGPU. We can bring a lot of fun opportunities to build AI assistants for everyone and enable privacy while enjoying GPU acceleration. WebLLM offers a minimalist and modular interface to access the chatbot in the browser. The WebLLM package itself does not come with UI, and is designed in a modular way to hook to any of the UI components. ...

Downloads: 2 This Week

Last Update: 2026-03-13
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

OpenVINO

OpenVINO™ Toolkit repository

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference. Boost deep learning performance in computer vision, automatic speech recognition, natural language processing and other common tasks. Use models trained with popular frameworks like TensorFlow, PyTorch and more. Reduce resource demands and efficiently deploy on a range of Intel® platforms from edge to cloud. This open-source version includes several components: namely Model Optimizer, OpenVINO™ Runtime,...

Downloads: 34 This Week

Last Update: 2026-03-25
See Project
6

gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models

gpt-oss is OpenAI’s open-weight family of large language models designed for powerful reasoning, agentic workflows, and versatile developer use cases. The series includes two main models: gpt-oss-120b, a 117-billion parameter model optimized for general-purpose, high-reasoning tasks that can run on a single H100 GPU, and gpt-oss-20b, a lighter 21-billion parameter model ideal for low-latency or specialized applications on smaller hardware. Both models use a native MXFP4 quantization for efficient memory use and support OpenAI’s Harmony response format, enabling transparent full chain-of-thought reasoning and advanced tool integrations such as function calling, browsing, and Python code execution. ...

1 Review

Downloads: 19 This Week

Last Update: 2026-01-13
See Project
7

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that...

Downloads: 11 This Week

Last Update: 2026-03-09
See Project
8

FurMark

GPU stress test OpenGL and Vulkan graphics benchmark Windows/Linux

FurMark is an intensive benchmarking tool designed to evaluate the performance of graphics cards using fur rendering algorithms. This tool is particularly effective in generating high workloads that can significantly increase the temperature of the GPU, making it a useful utility for testing the stability and stress tolerance of graphics cards. By simulating demanding rendering tasks, FurMark serves as a comprehensive test for assessing the robustness and thermal performance of GPUs under...

Downloads: 330 This Week

Last Update: 2024-10-28
See Project
9

CUDA Agent

Large-Scale Agentic RL for High-Performance CUDA Kernel Generation

CUDA Agent is a research-driven agentic reinforcement learning system designed to automatically generate and optimize high-performance CUDA kernels for GPU workloads. The project addresses the long-standing challenge that efficient CUDA programming typically requires deep hardware expertise by training an autonomous coding agent capable of iterative improvement through execution feedback. Its architecture combines large-scale data synthesis, a skill-augmented CUDA development environment, and long-horizon reinforcement learning to build intrinsic optimization capability rather than relying on simple post-hoc tuning. ...

Downloads: 1 This Week

Last Update: 2026-03-03
See Project
Custom VMs From 1 to 96 vCPUs With 99.95% Uptime
General-purpose, compute-optimized, or GPU/TPU-accelerated. Built to your exact specs.

Live migration and automatic failover keep workloads online through maintenance. One free e2-micro VM every month.

Try Free
10

Unsloth-MLX

Bringing the Unsloth experience to Mac users via Apple's MLX framework

...This project removes traditional barriers that prevent Mac users from prototyping and experimenting with LLM training locally by allowing the same code used in cloud GPU environments to run on M-series hardware, improving workflow continuity and reducing iteration costs. It supports loading and training Hugging Face models with fine-tuning strategies like SFT, DPO, ORPO, and GRPO and even handles exporting models to formats like GGUF for downstream use, although some limitations apply with quantized models. ...

Downloads: 1 This Week

Last Update: 2 days ago
See Project
11

JAX Toolbox

Public CI, Docker images for popular JAX libraries

JAX Toolbox is a development toolkit designed to streamline and optimize the use of JAX for machine learning and high-performance computing on NVIDIA GPUs. It provides prebuilt Docker images, continuous integration pipelines, and optimized example implementations that help developers quickly set up and run JAX workloads without complex configuration. The project supports popular JAX-based frameworks and models, including architectures used for large-scale pretraining such as GPT and LLaMA...

Downloads: 0 This Week

Last Update: 6 days ago
See Project
12

Diffrax

Numerical differential equation solvers in JAX

Diffrax is a numerical differential equation solving library built for the JAX ecosystem, with a strong focus on composability, differentiability, and high-performance scientific computing. The project provides tools for solving ordinary differential equations, stochastic differential equations, controlled differential equations, and related systems in a way that fits naturally into modern machine learning and differentiable programming workflows. Because it is written to work closely with...

Downloads: 0 This Week

Last Update: 2026-03-12
See Project
13

CUDA Containers for Edge AI & Robotics

Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

CUDA Containers for Edge AI & Robotics is an open-source project that provides a modular container build system designed for running machine learning and AI workloads on NVIDIA Jetson devices. The repository contains container configurations that package the latest AI frameworks and dependencies optimized for Jetson hardware. These containers simplify the deployment of complex machine learning environments by bundling libraries such as CUDA, TensorRT, and deep learning frameworks into reproducible container images. The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. ...

Downloads: 0 This Week

Last Update: 5 days ago
See Project
14

EPLB

Expert Parallelism Load Balancer

EPLB is DeepSeek’s open implementation of a load balancing algorithm designed for expert parallelism (EP) settings in MoE architectures. In EP, different “experts” are mapped to different GPUs or nodes, so load imbalance becomes a performance bottleneck if certain experts are invoked much more often. EPLB solves this by duplicating heavily used experts (redundancy) and then placing those duplicates across GPUs to even out computational load. It uses policies like hierarchical load balancing...

Downloads: 0 This Week

Last Update: 2025-10-03
See Project
15

Selkies-GStreamer

Open-Source Low-Latency Accelerated Linux WebRTC HTML5 Remote Desktop

selkies-gstreamer is a GStreamer-based media streaming component used in the Selkies project, a cloud-native platform designed for interactive desktop and application streaming. This module acts as a high-performance media pipeline that captures video, encodes it with low latency, and streams it via WebRTC to client browsers. It is optimized for GPU-accelerated encoding and integrates with Kubernetes-based deployments to enable scalable, real-time remote desktop sessions. This component...

Downloads: 0 This Week

Last Update: 2025-03-27
See Project
16

Superposition Benchmark (Unigine)

GPU benchmark testing graphics performance with realistic 3D scenes.

...Widely used by gamers and hardware reviewers, it is proprietary but offers a free edition.

Downloads: 107 This Week

Last Update: 2025-10-07
See Project
17

powerMAX

powerMAX is a CPU and GPU burn-in test

powerMAX is a CPU and GPU burn-in tool designed to push your hardware to its absolute thermal and power limits. It helps users uncover stability issues, cooling weaknesses, and power delivery problems by applying maximum, sustained stress to both the processor and graphics card. The utility supports dedicated CPU tests—SSE or AVX—and a demanding GPU 3D rendering test, with the option to run both simultaneously for full-system power load evaluation.

1 Review

Downloads: 25 This Week

Last Update: 2025-11-22
See Project
18

MegEngine

Easy-to-use deep learning framework with 3 key features

MegEngine is a fast, scalable and easy-to-use deep learning framework with 3 key features. You can represent quantization/dynamic shape/image pre-processing and even derivation in one model. After training, just put everything into your model and inference it on any platform at ease. Speed and precision problems won't bother you anymore due to the same core inside. In training, GPU memory usage could go down to one-third at the cost of only one additional line, which enables the DTR...

Downloads: 2 This Week

Last Update: 2024-04-30
See Project
19

MaxText

A simple, performant and scalable Jax LLM

MaxText is a high-performance, highly scalable open-source framework designed to train and fine-tune large language models using the JAX ecosystem. The project acts as both a reference implementation and a practical training library that demonstrates best practices for building and scaling transformer-based language models on modern accelerator hardware. It is optimized to run efficiently on Google Cloud TPUs and GPUs, enabling researchers and engineers to train models ranging from small...

Downloads: 0 This Week

Last Update: 2026-03-23
See Project
20

TensorFlow Probability

Probabilistic reasoning and statistical analysis in TensorFlow

TensorFlow Probability is a library for probabilistic reasoning and statistical analysis. TensorFlow Probability (TFP) is a Python library built on TensorFlow that makes it easy to combine probabilistic models and deep learning on modern hardware (TPU, GPU). It's for data scientists, statisticians, ML researchers, and practitioners who want to encode domain knowledge to understand data and make predictions. Since TFP inherits the benefits of TensorFlow, you can build, fit, and deploy a model using a single language throughout the lifecycle of model exploration and production. ...

Downloads: 0 This Week

Last Update: 2024-11-08
See Project
21

CosyVoice

Multi-lingual large voice generation model, providing inference

CosyVoice is a multilingual large voice generation model that offers a full-stack solution for training, inference, and deployment of high-quality TTS systems. The model supports multiple languages, including Chinese, English, Japanese, Korean, and a range of Chinese dialects such as Cantonese, Sichuanese, Shanghainese, Tianjinese, and Wuhanese. It is designed for zero-shot voice cloning and cross-lingual or mix-lingual scenarios, so a single reference voice can be used to synthesize speech...

Downloads: 0 This Week

Last Update: 2025-11-30
See Project
22

VCClient

Software that uses AI to perform real-time voice conversion

...It provides both a graphical user interface and API access, making it suitable for casual users as well as developers who want to integrate voice transformation into their own applications. The project also supports GPU acceleration, enabling faster inference and smoother real-time performance on compatible hardware. Additionally, it includes tools for training and managing voice models, giving users the ability to create personalized voice profiles.

Downloads: 18 This Week

Last Update: 2026-03-23
See Project
23

Deep Java Library (DJL)

An engine-agnostic deep learning framework in Java

...Because DJL is deep learning engine agnostic, you don't have to make a choice between engines when creating your projects. You can switch engines at any point. To ensure the best performance, DJL also provides automatic CPU/GPU choice based on hardware configuration.

1 Review

Downloads: 1 This Week

Last Update: 2025-12-15
See Project
24

local-llm

Run LLMs locally on Cloud Workstations

local-llm is a development framework that enables developers to run large language models locally within Google Cloud Workstations or standard environments without requiring GPU hardware. It focuses on making generative AI development more accessible by leveraging quantized models and CPU-based execution, eliminating the dependency on expensive GPU infrastructure. The repository includes tools, Docker configurations, and command-line utilities that simplify the process of downloading, running, and interacting with language models directly on local or cloud-based workstations. ...

Downloads: 0 This Week

Last Update: 2026-03-17
See Project
25

LinuxHardware Info

Hardware Info for Linux portable AppImage + Benchmark

LinuxHardware Info (portable/AppImage) -------------------------------------- Ihr wollt wissen was in Eurem Rechner steckt? Genau das könnt Ihr mit LinuxHardwareInfo, ohne jedes mal die Konsole zu bemühen. Kurz-Info: - Hardware Analyse - einfache Benchmarks (CPU / Grafik/RAM und HDD) - HDD/USB Fake Erkennung - Live Sensoren - Portable AppImage (einfach vom USB-Stick in Live-Image kopieren) - automatische Update Erkennung ab v1.2 - Darkmode - und anderes - im Support-Forum gibt...

1 Review

Downloads: 8 This Week

Last Update: 3 days ago
See Project