nvidia%20gpu%20mod free download

NVIDIA NeMo

Toolkit for conversational AI

NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data.

Downloads: 4 This Week

Last Update: 2026-01-09

See Project

NVIDIA NeMo Framework

Scalable generative AI framework built for researchers and developers

NVIDIA NeMo is a scalable, cloud-native generative AI framework aimed at researchers and PyTorch developers working on large language models, multimodal models, and speech AI (ASR and TTS), with growing support for computer vision. It provides collections of domain-specific modules and reference implementations that make it easier to pre-train, fine-tune, and deploy very large models on multi-GPU and multi-node infrastructure.

Downloads: 0 This Week

Last Update: 2026-01-09

See Project

clone-voice

A sound cloning tool with a web interface, using your voice

...The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control cloning and synthesis. It does not require an NVIDIA GPU to run basic tasks, although GPU acceleration can be used when available, making it accessible on modest machines. The tool supports around sixteen languages, including Chinese, English, Japanese, Korean, French, German, Italian, and others, and can capture reference voices directly from a microphone or from uploaded audio.

Downloads: 17 This Week

Last Update: 2025-11-28

See Project

FastKoko

Dockerized FastAPI wrapper for Kokoro-82M text-to-speech model

FastKoko is a self-hosted text-to-speech server built around the Kokoro-82M model and exposed through a FastAPI backend. It is designed to be easy to deploy via Docker, with separate CPU and GPU images so that users can choose between pure CPU inference and NVIDIA GPU acceleration. The project exposes an OpenAI-compatible speech endpoint, which means existing code that talks to the OpenAI audio API can often be pointed at a Kokoro-FastAPI instance with minimal changes. It supports multiple languages and voicepacks and allows phoneme based generation for more accurate pronunciation and prosody. ...

Downloads: 2 This Week

Last Update: 2025-12-13

See Project

WhisperLive

A nearly-live implementation of OpenAI's Whisper

...It runs as a server–client system in which the server hosts a Whisper backend and clients stream audio to be transcribed with very low delay. The project supports multiple inference backends, including Faster-Whisper, NVIDIA TensorRT, and OpenVINO, allowing you to target GPUs and different CPU architectures efficiently. It can handle microphone input, pre-recorded audio files, and network streams such as RTSP and HLS, making it flexible for live events, monitoring, or accessibility workflows. Configuration options let you control the number of clients, maximum connection time, and threading behavior so the server can be tuned for different deployment environments. ...

Downloads: 9 This Week

Last Update: 2025-11-28

See Project

OuteTTS

Interface for OuteTTS models

...The project supports multiple backends including llama.cpp (Python bindings and server), Hugging Face Transformers, ExLlamaV2, VLLM and a JavaScript interface via Transformers.js, allowing it to run on CPUs, NVIDIA CUDA GPUs, AMD ROCm, Vulkan-capable GPUs, and Apple Metal. It also includes a notion of speaker profiles: you can create a speaker from a short audio sample, save it as JSON, and reuse it for consistent voice identity across generations and sessions. For best quality, the model is designed to work with a reference speaker clip and will inherit emotion, style, and accent from that reference.

Downloads: 1 This Week

Last Update: 2025-11-28

See Project

Style-Bert-VITS2

Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles

...It includes a full GUI editor to script dialogue, set different styles per line, edit dictionaries, and save/load projects, plus a separate web UI and Colab notebooks for training and experimentation. For those who only need synthesis, the project is published as a Python library (pip install style-bert-vits2) and can run on CPU without an NVIDIA GPU, though training still requires GPU hardware.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Parallel WaveGAN

Unofficial Parallel WaveGAN

...Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. The project provides pre-trained models, Colab demos, and example configurations, allowing researchers to quickly evaluate vocoder quality or adapt models to new datasets.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

HiFi-GAN

Generative Adversarial Networks for Efficient and High Fidelity Speech

...In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168× faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13× faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Bangla TTS

Bangla text to speech synthesis in python

...Installation -------------------------------------- * Install Anaconda * conda create -n new_virtual_env python==3.6.8 * conda activate new_virtual_env * pip install -r requirements.txt * While running for the first time, keep your internet connection on to download the weights of the speech synthesis models (>500 MB) * For fast inference, you must install tensorflow-gpu and have a NVidia GPU. Project link: https://github.com/zabir-nabil/bangla-tts

Downloads: 8 This Week

Last Update: 2020-09-03

See Project

OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition

...It supports multi-GPU and multi-node data-parallel training, and integrates with Horovod to scale out across large GPU clusters. Mixed-precision support (float16) is optimized for NVIDIA Volta and Turing GPUs, allowing significant speedups and memory savings without sacrificing model quality. The project comes with configuration-driven training scripts, documentation, and examples that demonstrate how to set up pipelines for tasks.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Search Results for "nvidia%20gpu%20mod"

Showing 11 open source projects for "nvidia%20gpu%20mod"

NVIDIA NeMo

NVIDIA NeMo Framework

clone-voice

FastKoko

WhisperLive

OuteTTS

Style-Bert-VITS2

Parallel WaveGAN

HiFi-GAN

Bangla TTS

OpenSeq2Seq

Search Results for "nvidia%20gpu%20mod"

Showing 11 open source projects for "nvidia%20gpu%20mod"

NVIDIA NeMo

NVIDIA NeMo Framework

clone-voice

FastKoko

WhisperLive

OuteTTS

Style-Bert-VITS2

Parallel WaveGAN

HiFi-GAN

Bangla TTS

OpenSeq2Seq

Related Searches

Related Categories