Showing 48 open source projects for "nvidia gpu mod"

View related business solutions
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 1
    NVIDIA cuOpt

    NVIDIA cuOpt

    GPU accelerated decision optimization

    NVIDIA cuOpt is a GPU-accelerated optimization engine designed to solve complex mathematical optimization problems at large scale. It supports a range of optimization models including linear programming (LP), mixed integer linear programming (MILP), quadratic programming (QP), and vehicle routing problems (VRP). Built primarily in C++, cuOpt leverages NVIDIA GPUs to deliver near real-time solutions for optimization tasks involving millions of variables and constraints. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 3
    NVIDIA PhysicsNeMo

    NVIDIA PhysicsNeMo

    Open-source deep-learning framework for building and training

    NVIDIA PhysicsNeMo is an open-source deep learning framework designed for building artificial intelligence models that incorporate physical laws and scientific knowledge into machine learning workflows. The framework focuses on the emerging field of physics-informed machine learning, where neural networks are used alongside physical equations to model complex scientific systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 4
    NVIDIA NeMo Framework

    NVIDIA NeMo Framework

    Scalable generative AI framework built for researchers and developers

    NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure.
    Downloads: 2 This Week
    Last Update:
    See Project
  • Full-stack observability with actually useful AI | Grafana Cloud Icon
    Full-stack observability with actually useful AI | Grafana Cloud

    Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

    Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
    Create free account
  • 5
    NVIDIA Generative AI Examples

    NVIDIA Generative AI Examples

    Generative AI reference workflows

    NVIDIA GenerativeAIExamples is an open-source repository that provides practical reference implementations and example workflows for building generative AI applications using NVIDIA’s software ecosystem. The project is designed to help developers accelerate the development of AI applications by providing ready-to-run pipelines, notebooks, and tools that demonstrate how to integrate large language models into real-world systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    FastKoko

    FastKoko

    Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

    FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    autoresearch-win-rtx

    autoresearch-win-rtx

    AI agents running research on single-GPU nanochat training

    ...Experiments are executed within a fixed time budget, ensuring consistent benchmarking across iterations and allowing the agent to focus on incremental improvements. The framework is designed to be lightweight and accessible, making it suitable for developers and researchers working on desktop hardware. It also supports modern GPU acceleration features through PyTorch, enabling efficient experimentation even on limited resources.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    ...The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control cloning and synthesis. It does not require an NVIDIA GPU to run basic tasks, although GPU acceleration can be used when available, making it accessible on modest machines. The tool supports around sixteen languages, including Chinese, English, Japanese, Korean, French, German, Italian, and others, and can capture reference voices directly from a microphone or from uploaded audio.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 9
    Transformer Engine

    Transformer Engine

    A library for accelerating Transformer models on NVIDIA GPUs

    Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++...
    Downloads: 0 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Humanoid-Gym

    Humanoid-Gym

    Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real

    Humanoid-Gym is a reinforcement learning framework designed to train locomotion and control policies for humanoid robots using high-performance simulation environments. The system is built on top of NVIDIA Isaac Gym, which allows large-scale parallel simulation of robotic environments directly on GPU hardware. Its primary goal is to enable efficient training of humanoid robots in simulation while enabling policies to transfer effectively to real-world hardware without additional training. The framework emphasizes the concept of zero-shot sim-to-real transfer, meaning that behaviors learned in simulation can be deployed directly on physical robots with minimal adjustment. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    CUDA Containers for Edge AI & Robotics

    CUDA Containers for Edge AI & Robotics

    Machine Learning Containers for NVIDIA Jetson and JetPack-L4T

    ...The project is particularly useful for developers building edge AI and robotics systems that rely on GPU-accelerated inference and real-time computer vision. By using containerized environments, developers can ensure that their applications run consistently across different Jetson platforms and JetPack versions. The repository also includes build tools and package management utilities that help automate the process of assembling machine learning environments.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 12
    TensorRT LLM

    TensorRT LLM

    TensorRT LLM provides users with an easy-to-use Python API

    TensorRT-LLM is an open-source high-performance inference library specifically designed to optimize and accelerate large language model deployment on NVIDIA GPUs. It provides a Python-based API built on top of PyTorch that allows developers to define, customize, and deploy LLMs efficiently across a variety of hardware configurations, from single GPUs to large multi-node clusters. The library focuses on maximizing throughput and minimizing latency through advanced techniques such as...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    FlashAttention

    FlashAttention

    Fast and memory-efficient exact attention

    ...It achieves this by using IO-aware algorithms that minimize memory reads and writes, reducing the quadratic memory overhead typically associated with attention operations. The project provides implementations of FlashAttention, FlashAttention-2, and newer iterations optimized for modern GPU architectures such as NVIDIA Hopper and AMD accelerators. By improving both forward and backward pass efficiency, it enables training and inference of large language models with longer sequence lengths and higher throughput. The library integrates with PyTorch and supports various attention configurations, including causal masking, multi-query attention, and rotary embeddings.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 14
    CogVideo

    CogVideo

    text and image to video generation: CogVideoX (2024) and CogVideo

    CogVideo is an open source text-/image-/video-to-video generation project that hosts the CogVideoX family of diffusion-transformer models and end-to-end tooling. The repo includes SAT and Diffusers implementations, turnkey demos, and fine-tuning pipelines (including LoRA) designed to run across a wide range of NVIDIA GPUs, from desktop cards (e.g., RTX 3060) to data-center hardware (A100/H100). Current releases cover CogVideoX-2B, CogVideoX-5B, and the upgraded CogVideoX1.5-5B variants, plus image-to-video (I2V) models, with options for BF16/FP16/FP32—and INT8 quantized inference via TorchAO for memory-constrained setups. The codebase emphasizes practical deployment: prompt-optimization utilities (LLM-assisted long-prompt expansion), Colab notebooks, a Gradio web app, and multiple performance knobs (tiling/slicing, CPU offload, torch.compile, multi-GPU, and FA3 backends via partner projects).
    Downloads: 18 This Week
    Last Update:
    See Project
  • 15
    exo

    exo

    Run your own AI cluster at home with everyday devices

    Run your own AI cluster at home with everyday devices. Maintained by exo labs. Forget expensive NVIDIA GPUs, unify your existing devices into one powerful GPU, iPhone, iPad, Android, Mac, Linux, or pretty much any device. Now the default models, run 8B, 70B, and 405B parameter models on your own devices.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Simple StyleGan2 for Pytorch

    Simple StyleGan2 for Pytorch

    Simplest working implementation of Stylegan2

    Simple Pytorch implementation of Stylegan2 that can be completely trained from the command-line, no coding needed. You will need a machine with a GPU and CUDA installed. You can also specify the location where intermediate results and model checkpoints should be stored. You can increase the network capacity (which defaults to 16) to improve generation results, at the cost of more memory. By default, if the training gets cut off, it will automatically resume from the last checkpointed file....
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    TAME LLM

    TAME LLM

    Traditional Mandarin LLMs for Taiwan

    TAME LLM is an open-source initiative focused on building and releasing large language models optimized for Traditional Mandarin and the linguistic context of Taiwan. The project includes models such as Llama-3-Taiwan-70B, which are fine-tuned versions of large transformer architectures trained on extensive corpora containing both Traditional Mandarin and English text. These models are designed to support applications such as conversational AI, knowledge retrieval, and domain-specific...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    WanGP

    WanGP

    AI video generator optimized for low VRAM and older GPUs use

    Wan2GP is an open source AI video generation toolkit designed to make modern generative models accessible on consumer-grade hardware with limited GPU memory. It acts as a unified interface for running multiple video, image, and audio generation models, including Wan-based models as well as other systems like Hunyuan Video, Flux, and Qwen. A key focus of the project is reducing VRAM requirements, enabling some workflows to run on as little as 6 GB while still supporting older Nvidia and certain AMD GPUs. ...
    Downloads: 23 This Week
    Last Update:
    See Project
  • 19
    Style-Bert-VITS2

    Style-Bert-VITS2

    Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

    ...It includes a full GUI editor to script dialogue, set different styles per line, edit dictionaries, and save/load projects, plus a separate web UI and Colab notebooks for training and experimentation. For those who only need synthesis, the project is published as a Python library (pip install style-bert-vits2) and can run on CPU without an NVIDIA GPU, though training still requires GPU hardware.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 20
    AWS Deep Learning Containers

    AWS Deep Learning Containers

    A set of Docker images for training and serving models in TensorFlow

    AWS Deep Learning Containers (DLCs) are a set of Docker images for training and serving models in TensorFlow, TensorFlow 2, PyTorch, and MXNet. Deep Learning Containers provide optimized environments with TensorFlow and MXNet, Nvidia CUDA (for GPU instances), and Intel MKL (for CPU instances) libraries and are available in the Amazon Elastic Container Registry (Amazon ECR). The AWS DLCs are used in Amazon SageMaker as the default vehicles for your SageMaker jobs such as training, inference, transforms etc. They've been tested for machine learning workloads on Amazon EC2, Amazon ECS and Amazon EKS services as well. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 21
    SimpleLLM

    SimpleLLM

    950 line, minimal, extensible LLM inference engine built from scratch

    SimpleLLM is a minimal, extensible large language model inference engine implemented in roughly 950 lines of code, built from scratch to serve both as a learning tool and a research platform for novel inference techniques. It provides the core components of an LLM runtime—such as tokenization, batching, and asynchronous execution—without the abstraction overhead of more complex engines, making it easier for developers and researchers to understand and modify. Designed to run efficiently on...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 22
    WhisperLive

    WhisperLive

    A nearly-live implementation of OpenAI's Whisper

    WhisperLive is a “nearly live” implementation of OpenAI’s Whisper model focused on real-time transcription. It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 23
    CogAgent

    CogAgent

    An open sourced end-to-end VLM-based GUI Agent

    CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    ...This fork is supported across Linux, Windows and Macintosh. Linux users can use either an Nvidia-based card (with CUDA support) or an AMD card (using the ROCm driver). We do not recommend the GTX 1650 or 1660 series video cards. They are unable to run in half-precision mode and do not have sufficient VRAM to render 512x512 images.
    Downloads: 19 This Week
    Last Update:
    See Project
  • 25
    FastChat

    FastChat

    Open platform for training, serving, and evaluating language models

    ...If you do not have enough memory, you can enable 8-bit compression by adding --load-8bit to the commands above. This can reduce memory usage by around half with slightly degraded model quality. It is compatible with the CPU, GPU, and Metal backend. Vicuna-13B with 8-bit compression can run on a single NVIDIA 3090/4080/T4/V100(16GB) GPU. In addition to that, you can add --cpu-offloading to commands above to offload weights that don't fit on your GPU onto the CPU memory. This requires 8-bit compression to be enabled and the bitsandbytes package to be installed, which is only available on linux operating systems.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • Next
MongoDB Logo MongoDB