Page 2 | nvidia free download

Showing 106 open source projects for "nvidia"

View related business solutions

Artificial Intelligence Clear Filters & Widen Search

$300 in Free Credit Towards Top Cloud Services
Build VMs, containers, AI, databases, storage—all in one place.

Start your project in minutes. After credits run out, 20+ products include free monthly usage. Only pay when you're ready to scale.

Get Started
AI-powered service management for IT and enterprise teams
Enterprise-grade ITSM, for every business

Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.

Try it Free
1

FastKoko

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. ...

Downloads: 3 This Week

Last Update: 2025-12-13
See Project
2

Evo 2

Genome modeling and design across all domains of life

...It supports multiple ways of working with the model, including forward passes, embeddings, generation workflows, notebooks, hosted APIs, and self-hosted deployment through NVIDIA NIM.

Downloads: 1 This Week

Last Update: 4 days ago
See Project
3

TensorRT LLM

TensorRT LLM provides users with an easy-to-use Python API

TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as quantization, custom attention kernels, and optimized memory management strategies. ...

Downloads: 0 This Week

Last Update: 2026-04-16
See Project
4

Humanoid-Gym

Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. The framework emphasizes the concept of zero-shot sim-to-real transfer, meaning that behaviors learned in simulation can be deployed directly on physical robots with minimal adjustment. ...

Downloads: 0 This Week

Last Update: 2026-03-15
See Project
Fully Managed MySQL, PostgreSQL, and SQL Server
Automatic backups, patching, replication, and failover. Focus on your app, not your database.

Cloud SQL handles your database ops end to end, so you can focus on your app.

Try Free
5

WanGP

AI video generator optimized for low VRAM and older GPUs use

...It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and certain AMD GPUs. Wan2GP provides a full web-based interface that simplifies interaction with complex generative pipelines, making it easier to configure prompts, models, and rendering settings. It also integrates a wide range of utilities such as prompt enhancement, mask editing, motion design, and extraction tools for pose, depth, and flow data to support advanced video workflows.

Downloads: 49 This Week

Last Update: 12 hours ago
See Project
6

TAME LLM

Traditional Mandarin LLMs for Taiwan

...These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific reasoning in fields like manufacturing, law, healthcare, and electronics. The training pipeline leverages high-performance computing infrastructure and frameworks such as NVIDIA NeMo and Megatron to enable large-scale model training. Taiwan-LLM aims to improve language understanding and generation for Traditional Mandarin users by incorporating region-specific datasets and evaluation benchmarks.

Downloads: 0 This Week

Last Update: 2026-03-09
See Project
7

FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

FlashMLA is a high-performance decoding kernel library designed especially for Multi-Head Latent Attention (MLA) workloads, targeting NVIDIA Hopper GPU architectures. It provides optimized kernels for MLA decoding, including support for variable-length sequences, helping reduce latency and increase throughput in model inference systems using that attention style. The library supports both BF16 and FP16 data types, and includes a paged KV cache implementation with a block size of 64 to efficiently manage memory during decoding. ...

Downloads: 0 This Week

Last Update: 2026-03-31
See Project
8

FlashAttention

Fast and memory-efficient exact attention

...It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.

Downloads: 23 This Week

Last Update: 2026-03-18
See Project
9

exo

Run your own AI cluster at home with everyday devices

Run your own AI cluster at home with everyday devices. Maintained by exo labs. Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU, iPhone, iPad, Android, Mac, Linux, or pretty much any device. Now the default models, run 8B, 70B, and 405B parameter models on your own devices.

Downloads: 2 This Week

Last Update: 22 hours ago
See Project
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
10

CV-CUDA

CV-CUDA™ is an open-source, GPU accelerated library

...It uses graphics processing unit (GPU) acceleration to help developers build highly efficient pre- and post-processing pipelines. CV-CUDA originated as a collaborative effort between NVIDIA and ByteDance.

Downloads: 0 This Week

Last Update: 2025-11-15
See Project
11

Jan.ai

Open source alternative to ChatGPT that runs 100% offline

Jan.ai is an open-source, privacy-focused AI assistant that serves as an alternative to ChatGPT, running completely locally on your device. It allows you to download and run LLMs (local language models) offline while also offering optional integration with cloud-based model providers—giving you full control over your data and AI interactions. Download and run LLMs (Llama, Gemma, Qwen, GPT-oss etc.) from HuggingFace. Connect to GPT models via OpenAI, Claude models via Anthropic, Mistral,...

Downloads: 23 This Week

Last Update: 2026-03-23
See Project
12

WhisperLive

A nearly-live implementation of OpenAI's Whisper

...It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. ...

Downloads: 17 This Week

Last Update: 2026-03-17
See Project
13

Lyra 2

Project Lyra: Open Generative 3D World Models

The Lyra 2 project is a research-driven framework developed by NVIDIA that focuses on building open generative 3D world models using advanced diffusion-based techniques. It enables the creation of fully explorable 3D environments from minimal inputs such as a single image or video, leveraging self-distillation methods to generate consistent spatial representations. The system evolves across versions, with newer iterations introducing long-horizon generation and improved 3D consistency across frames. ...

Downloads: 8 This Week

Last Update: 7 days ago
See Project
14

NeMo Retriever Library

Document content and metadata extraction microservice

...It processes various document types by splitting them into components such as text, tables, charts, and images, and then applies OCR and contextual analysis to convert them into structured data formats. The system is built on NVIDIA NIM microservices, enabling high-performance parallel processing and efficient handling of large datasets. It supports multiple extraction strategies for different document formats, balancing accuracy and throughput depending on the use case. Additionally, it can generate embeddings for extracted content and integrate with vector databases like Milvus, making it well-suited for retrieval-augmented generation pipelines.

Downloads: 2 This Week

Last Update: 2026-03-18
See Project
15

SimpleLLM

950 line, minimal, extensible LLM inference engine built from scratch

...It provides the core components of an LLM runtime—such as tokenization, batching, and asynchronous execution—without the abstraction overhead of more complex engines, making it easier for developers and researchers to understand and modify. Designed to run efficiently on high-end GPUs like NVIDIA H100 with support for models such as OpenAI/gpt-oss-120b, Simple-LLM implements continuous batching and event-driven inference loops to maximize hardware utilization and throughput. Its straightforward code structure allows anyone experimenting with custom kernels, new batching strategies, or inference optimizations to trace execution from input to output with minimal cognitive overhead.

Downloads: 2 This Week

Last Update: 2026-01-28
See Project
16

OpenShell

OpenShell is the safe, private runtime for autonomous AI agents.

OpenShell is an open-source runtime designed to safely run autonomous AI agents in isolated environments. Developed by NVIDIA, it provides sandboxed execution spaces that protect system resources, credentials, and data from unauthorized access. Each agent runs inside a containerized sandbox governed by declarative YAML security policies that control network access, file permissions, and process behavior. The platform includes a gateway service that manages sandbox lifecycles and routes AI inference requests through controlled providers. ...

Downloads: 4 This Week

Last Update: 22 hours ago
See Project
17

Porcupine

On-device wake word detection powered by deep learning

...It is using deep neural networks trained in real-world environments. Compact and computationally-efficient. It is perfect for IoT. Cross-platform. Arm Cortex-M, STM32, PSoC, Arduino, and i.MX RT. Raspberry Pi, NVIDIA Jetson Nano, and BeagleBone. Android and iOS. Chrome, Safari, Firefox, and Edge. Linux (x86_64), macOS (x86_64, arm64), and Windows (x86_64). Scalable. It can detect multiple always-listening voice commands with no added runtime footprint. Self-service. Developers can train custom wake word models using Picovoice Console. Porcupine is the right product if you need to detect one or a few static (always-listening) voice commands. ...

Downloads: 4 This Week

Last Update: 2025-12-11
See Project
18

Triton Inference Server

The Triton Inference Server provides an optimized cloud

...Triton enables teams to deploy any AI model from multiple deep learning and machine learning frameworks, including TensorRT, TensorFlow, PyTorch, ONNX, OpenVINO, Python, RAPIDS FIL, and more. Triton supports inference across cloud, data center, edge, and embedded devices on NVIDIA GPUs, x86 and ARM CPU, or AWS Inferentia. Triton delivers optimized performance for many query types, including real-time, batched, ensembles, and audio/video streaming. Provides Backend API that allows adding custom backends and pre/post-processing operations. Model pipelines using Ensembling or Business Logic Scripting (BLS). HTTP/REST and GRPC inference protocols based on the community-developed KServe protocol. ...

Downloads: 4 This Week

Last Update: 2026-04-10
See Project
19

AWS Deep Learning Containers

A set of Docker images for training and serving models in TensorFlow

AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...

Downloads: 6 This Week

Last Update: 17 hours ago
See Project
20

oneDNN

oneAPI Deep Neural Network Library (oneDNN)

...The library is optimized for Intel(R) Architecture Processors, Intel Processor Graphics and Xe Architecture graphics. oneDNN has experimental support for the following architectures: Arm* 64-bit Architecture (AArch64), NVIDIA* GPU, OpenPOWER* Power ISA (PPC64), IBMz* (s390x), and RISC-V. oneDNN is intended for deep learning applications and framework developers interested in improving application performance on Intel CPUs and GPUs. Deep learning practitioners should use one of the applications enabled with oneDNN.

Downloads: 4 This Week

Last Update: 7 days ago
See Project
21

InvokeAI

InvokeAI is a leading creative engine for Stable Diffusion models

...InvokeAI offers an industry leading Web Interface, interactive Command Line Interface, and also serves as the foundation for multiple commercial products. This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.

1 Review

Downloads: 15 This Week

Last Update: 2026-03-22
See Project
22

Style-Bert-VITS2

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

...It includes a full GUI editor to script dialogue, set different styles per line, edit dictionaries, and save/load projects, plus a separate web UI and Colab notebooks for training and experimentation. For those who only need synthesis, the project is published as a Python library (pip install style-bert-vits2) and can run on CPU without an NVIDIA GPU, though training still requires GPU hardware.

Downloads: 4 This Week

Last Update: 2025-11-28
See Project
23

CogAgent

An open sourced end-to-end VLM-based GUI Agent

...The model is designed for agent-style execution rather than freeform chat, maintaining a continuous execution history across steps while requiring a fresh session for each new task. Inference supports BF16 on NVIDIA GPUs, with optional INT8 and INT4 modes available but with noted performance loss at INT4; example CLIs and a web demo illustrate bounding-box outputs and operation categories.

Downloads: 1 This Week

Last Update: 5 days ago
See Project
24

Instant Neural Graphics Primitives

Instant neural graphics primitives: lightning fast NeRF and more

Instant Neural Graphics Primitives, is an open-source research project developed by NVIDIA that enables extremely fast training and rendering of neural graphics representations. The system implements several neural graphics primitives including neural radiance fields, signed distance functions, neural images, and neural volumes. These representations are trained using a compact neural network combined with a multiresolution hash encoding that dramatically accelerates both training and rendering processes. ...

Downloads: 0 This Week

Last Update: 2026-03-10
See Project
25

DALI

A GPU-accelerated library containing highly optimized building blocks

The NVIDIA Data Loading Library (DALI) is a library for data loading and pre-processing to accelerate deep learning applications. It provides a collection of highly optimized building blocks for loading and processing image, video and audio data. It can be used as a portable drop-in replacement for built-in data loaders and data iterators in popular deep learning frameworks.

Downloads: 0 This Week

Last Update: 2026-02-19
See Project