gpu hardware free download

Showing 82 open source projects for "gpu hardware"

View related business solutions

Artificial Intelligence Linux Clear Filters & Widen Search

MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
1

GPU Puzzles

Solve puzzles. Learn CUDA

GPU Puzzles is an educational project designed to teach GPU programming concepts through interactive coding exercises and puzzles. Instead of presenting traditional lecture-style explanations, the project immerses learners directly in hands-on programming tasks that demonstrate how GPU computation works. The exercises are implemented using Python with the Numba CUDA interface, which allows Python code to compile into GPU kernels that run on CUDA-enabled hardware.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
2

GPU Hot

Real-time NVIDIA GPU dashboard

GPU Hot is an open-source, lightweight monitoring dashboard designed to provide real-time visibility into NVIDIA GPU performance across single machines or entire clusters. The project offers a self-hosted web interface that streams hardware metrics directly from GPU servers, enabling developers, ML engineers, and system administrators to observe GPU utilization and system behavior in real time through a browser.

Downloads: 4 This Week

Last Update: 5 days ago
See Project
3

llmfit

157 models, 30 providers, one command to find what runs on hardware

llmfit is a terminal-based utility that helps developers determine which large language models can realistically run on their local hardware by analyzing system resources and model requirements. The tool automatically detects CPU, RAM, GPU, and VRAM specifications, then ranks available models based on performance factors such as speed, quality, and memory fit. It provides both an interactive terminal user interface and a traditional CLI mode, enabling flexible workflows for different user preferences. llmfit also supports advanced configurations including multi-GPU setups, mixture-of-experts architectures, and dynamic quantization recommendations. ...

Downloads: 48 This Week

Last Update: 2 days ago
See Project
4

GPUStack

Performance-optimized AI inference on your GPUs

GPUStack is an open-source GPU cluster management platform designed to simplify the deployment and operation of artificial intelligence models across heterogeneous hardware environments. The system aggregates GPU resources from multiple machines into a unified cluster so developers and administrators can run large language models and other AI workloads efficiently across distributed infrastructure.

Downloads: 11 This Week

Last Update: 2026-03-26
See Project
Full-stack observability with actually useful AI | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
5

FlexLLMGen

Running large language models on a single GPU

FlexLLMGen is an open-source inference engine designed to run large language models efficiently on limited hardware resources such as a single GPU. The system focuses on high-throughput generation workloads where large batches of text must be processed quickly, such as large-scale data extraction or document analysis tasks. Instead of requiring expensive multi-GPU systems, the framework uses techniques such as memory offloading, compression, and optimized batching to run large models on commodity hardware.

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
6

HeavyDB

HeavyDB (formerly MapD/OmniSciDB)

...HeavyDB was originally developed as part of the OmniSci platform (formerly MapD) and is commonly used for large-scale analytics and geospatial data processing. The database compiles queries into optimized machine code that executes efficiently on GPU hardware, significantly accelerating analytical workloads. It supports hybrid deployment environments where queries can run on both CPU and GPU architectures depending on the available resources.

Downloads: 0 This Week

Last Update: 2026-03-11
See Project
7

LocalAI

The free, Open Source alternative to OpenAI, Claude and others

LocalAI is an open-source platform that allows users to run large language models and other AI systems locally on their own hardware. It acts as a drop-in replacement for APIs such as OpenAI, enabling developers to build AI-powered applications without relying on external cloud services. The platform supports a wide range of model types, including text generation, image creation, speech processing, and embeddings. LocalAI can run on consumer-grade hardware and does not necessarily require a GPU, making it accessible for local development and private deployments. ...

Downloads: 36 This Week

Last Update: 2026-04-07
See Project
8

AirLLM

AirLLM 70B inference with single 4GB GPU

AirLLM is an open source Python library that enables extremely large language models to run on consumer hardware with very limited GPU memory. The project addresses one of the main barriers to local LLM experimentation by introducing a memory-efficient inference technique that loads model layers sequentially rather than storing the entire model in GPU memory. This layer-wise inference approach allows models with tens of billions of parameters to run on devices with only a few gigabytes of VRAM. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
9

PowerInfer

High-speed Large Language Model Serving for Local Deployment

...PowerInfer incorporates specialized algorithms and sparse operators to manage neuron activation patterns and minimize data transfers between hardware components. As a result, it enables powerful language models to run on consumer hardware while achieving performance comparable to more expensive server-grade systems.

Downloads: 1 This Week

Last Update: 2026-03-04
See Project
Gemini 3 and 200+ AI Models on One Platform
Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

Build generative AI apps with Vertex AI. Switch between models without switching platforms.

Start Free
10

how-to-optim-algorithm-in-cuda

How to optimize some algorithm in cuda

...Instead of presenting only theoretical explanations, the repository includes hand-written CUDA implementations of fundamental operations such as reductions, element-wise computations, softmax, and attention mechanisms. These examples show how different optimization techniques influence performance on modern GPU hardware and allow readers to experiment with real implementations. The repository also contains extensive learning notes that summarize CUDA programming concepts, GPU architecture details, and performance engineering strategies.

Downloads: 2 This Week

Last Update: 2026-04-09
See Project
11

Parallax

Parallax is a distributed model serving framework

Parallax is a decentralized inference framework designed to run large language models across distributed computing resources. Instead of relying on centralized GPU clusters in data centers, the system allows multiple heterogeneous machines to collaborate in serving AI inference workloads. Parallax divides model layers across different nodes and dynamically coordinates them to form a complete inference pipeline. A two-stage scheduling architecture determines how model layers are allocated to available hardware and how requests are routed across nodes during execution. ...

Downloads: 4 This Week

Last Update: 2026-03-09
See Project
12

GPT4All

Run Local LLMs on Any Device. Open-source

GPT4All is an open-source project that allows users to run large language models (LLMs) locally on their desktops or laptops, eliminating the need for API calls or GPUs. The software provides a simple, user-friendly application that can be downloaded and run on various platforms, including Windows, macOS, and Ubuntu, without requiring specialized hardware. It integrates with the llama.cpp implementation and supports multiple LLMs, allowing users to interact with AI models privately. This...

1 Review

Downloads: 143 This Week

Last Update: 2025-03-17
See Project
13

node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama

...By using native bindings and optimized model execution, the framework allows developers to integrate advanced language model capabilities into desktop applications, server software, and command-line tools. The system automatically detects the available hardware on a machine and selects the most appropriate compute backend, including CPU or GPU acceleration. Developers can use the library to perform tasks such as text generation, conversational chat, embedding generation, and structured output generation. Because it runs models locally, the platform is particularly useful for privacy-sensitive environments or offline AI deployments.

Downloads: 17 This Week

Last Update: 2026-03-17
See Project
14

SkyPilot

SkyPilot: Run AI and batch jobs on any infra

SkyPilot is a framework for running AI and batch workloads on any infra, offering unified execution, high cost savings, and high GPU availability. Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Downloads: 0 This Week

Last Update: 2026-03-24
See Project
15

ChatGLM.cpp

C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)

ChatGLM.cpp is a C++ implementation of the ChatGLM-6B model, enabling efficient local inference without requiring a Python environment. It is optimized for running on consumer hardware.

Downloads: 9 This Week

Last Update: 2025-01-21
See Project
16

tt-metal

TT-NN operator library, and TT-Metalium low level kernel programming

tt-metal, also referred to in its documentation as TT-Metalium, is Tenstorrent’s low-level software development kit for programming applications on Tenstorrent AI accelerators. The project is designed for developers who need direct access to the company’s Tensix processor architecture, exposing a programming model that is closer to hardware control than high-level inference frameworks. Instead of following a traditional GPU model centered on massive thread parallelism, the platform is built around a grid of specialized compute nodes called Tensix cores, each with local SRAM, dedicated compute units, and multiple RISC-V control processors. The SDK provides the abstractions and APIs needed to manage data movement, compute kernels, memory coordination, and execution flow across this architecture.

Downloads: 31 This Week

Last Update: 5 days ago
See Project
17

Humanoid-Gym

Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. The framework emphasizes the concept of zero-shot sim-to-real transfer, meaning that behaviors learned in simulation can be deployed directly on physical robots with minimal adjustment. ...

Downloads: 1 This Week

Last Update: 2026-03-15
See Project
18

LuxTTS

A high-quality rapid TTS voice cloning model

LuxTTS is an open-source text-to-speech (TTS) system focused on delivering high-quality, rapid voice synthesis and voice cloning that runs extremely fast and efficiently on consumer hardware. It implements a lightweight architecture based on ZipVoice and optimized sampling techniques so that it can generate speech at speeds up to roughly 150 times real-time on a single GPU and faster than real-time on CPU, all while producing audio at high fidelity with 48 kHz quality. The project supports zero-shot voice cloning, meaning it can adapt to a reference speaker’s voice with minimal example data, enabling realistic and personalized synthetic speech. ...

Downloads: 4 This Week

Last Update: 2026-03-12
See Project
19

clone-voice

A sound cloning tool with a web interface, using your voice

Clone-voice is a local voice-cloning tool that lets you synthesize speech in any target voice or convert one recording into another voice using the same timbre. It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control...

Downloads: 11 This Week

Last Update: 2025-11-28
See Project
20

PEFT

State-of-the-art Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters. Fine-tuning large-scale PLMs is often prohibitively costly. In this regard, PEFT methods only fine-tune a small number of (extra) model parameters, thereby greatly decreasing the computational and storage costs. Recent State-of-the-Art PEFT techniques achieve performance comparable to that of full...

Downloads: 4 This Week

Last Update: 2 days ago
See Project
21

stt

Voice Recognition to Text Tool

...The project is designed to be easy to deploy: you can run a local Python server that exposes an HTTP API for uploading audio/video files and retrieving transcriptions in different formats. It supports GPU acceleration if available, enabling faster processing on compatible hardware but still offers reliable performance on CPUs alone.

Downloads: 3 This Week

Last Update: 2026-02-17
See Project
22

ort

Fast ML inference & training for ONNX models in Rust

ort is a high-performance Rust library that provides bindings to ONNX Runtime, enabling developers to run machine learning inference and training workflows directly within Rust applications using the standardized ONNX model format. It is designed to bridge the gap between modern machine learning frameworks and systems programming by offering a safe, ergonomic API for executing models originally built in ecosystems like PyTorch, TensorFlow, or scikit-learn. The library emphasizes speed and...

Downloads: 6 This Week

Last Update: 2026-03-19
See Project
23

uzu

A high-performance inference engine for AI models

uzu is a high-performance inference engine designed to run artificial intelligence models efficiently on Apple Silicon hardware. Written primarily in Rust and leveraging Apple’s Metal framework, the project focuses on maximizing performance when executing large language models and other AI workloads on devices such as Mac computers with M-series chips. The engine implements a hybrid architecture in which model layers can be executed either as custom GPU kernels or through Apple’s MPSGraph API, allowing it to balance performance and compatibility depending on the workload. ...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
24

Colossal-AI

Making large AI models cheaper, faster and more accessible

The Transformer architecture has improved the performance of deep learning models in domains such as Computer Vision and Natural Language Processing. Together with better performance come larger model sizes. This imposes challenges to the memory wall of the current accelerator hardware such as GPU. It is never ideal to train large models such as Vision Transformer, BERT, and GPT on a single GPU or a single machine. There is an urgent demand to train models in a distributed environment. However, distributed training, especially model parallelism, often requires domain expertise in computer systems and architecture. ...

Downloads: 0 This Week

Last Update: 2025-05-28
See Project
25

ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model

...The project provides inference code, demos (command line, web, API), quantization support for lower memory deployment, and tools for finetuning (e.g., via P-Tuning v2). It is optimized for dialogue and question answering with a balance between performance and deployability in consumer hardware settings. Support for quantized inference (INT4, INT8) to reduce GPU memory requirements. Automatic mode switching between precision/memory tradeoffs (full/quantized).

Downloads: 5 This Week

Last Update: 2025-09-26
See Project