Page 2 | which free download

NVIDIA NeMo Framework

Scalable generative AI framework built for researchers and developers

...The framework builds on PyTorch Lightning–style modular abstractions, so training scripts are composed from reusable components for data loading, models, optimizers, and schedulers, which simplifies experimentation and adaptation. NeMo is designed to scale: with tools like NeMo-Run, users can orchestrate large-scale experiments across thousands of GPUs.

Downloads: 0 This Week

Last Update: 5 days ago

See Project

Mice MX OS speech to text Voice Control

Mice speech to text with MX Cinnamon OS ISO

Note about this image This image contains a system based on Linux MX, which was created to improve accessibility within the Linux environment. The distribution uses the Cinnamon desktop interface, which is configured to be operated using voice commands and outputs. The user interface and the control of your own devices and home automation systems can be customized and extended. The voice control program MiceStTM.py was developed to enable easy adaptation to other languages. ...

Downloads: 0 This Week

Last Update: 2025-05-14

See Project

DiffSinger

Singing Voice Synthesis via Shallow Diffusion Mechanism

...This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN training — and produces more realistic, expressive, and natural-sounding singing. The method introduces a “shallow diffusion” mechanism: instead of diffusing over many steps, generation begins at a shallow step determined adaptively, which leverages prior knowledge learned by a simple mel-spectrogram decoder and speeds up inference.

Downloads: 30 This Week

Last Update: 2025-11-28

See Project

VoiceFixer

General Speech Restoration

...Unlike many single-purpose noise reduction tools, VoiceFixer targets a “general speech restoration” problem (GSR), capable of handling multiple types of distortions at once, which makes it suitable for old recordings, phone-call audio, amateur voice recordings, or archival media. Evaluations show that VoiceFixer significantly improves both objective and subjective audio quality compared to baseline speech-enhancement methods.

Downloads: 12 This Week

Last Update: 2025-11-28

See Project

TensorFlowTTS

Real-Time State-of-the-art Speech Synthesis for Tensorflow 2

...It offers a variety of architectures for text-to-speech, including classic and modern models such as Tacotron‑2, FastSpeech / FastSpeech2, and neural vocoders like MelGAN and Multiband‑MelGAN. Because it’s based on TensorFlow 2, it can leverage optimizations such as fake-quantization aware training and pruning — which allow models to run faster than real time and to be deployable on mobile or embedded platforms. The library supports multiple languages (English, French, Korean, Chinese, German, etc.) and is relatively easy to adapt to new languages. With integrated vocoder + mel-spectrogram generation pipelines, pre-trained models, and fairly flexible architecture, TensorFlowTTS is a great off-the-shelf and extensible TTS engine for applications ranging from voice assistants to content generation or accessibility tools.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

VITS

Conditional Variational Autoencoder with Adversarial Learning

...The repository provides training and inference pipelines for common datasets such as LJ Speech (single-speaker) and VCTK (multi-speaker), including filelists, configs, and preprocessing scripts. It also includes monotonic alignment search code and g2p preprocessing, which are crucial components for aligning text and speech in an end-to-end setup.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

PaddlePaddle models

Pre-trained and Reproduced Deep Learning Models

Pre-trained and Reproduced Deep Learning Models ("Flying Paddle" official model library, including a variety of academic frontier and industrial scene verification of deep learning models) Flying Paddle's industrial-level model library includes a large number of mainstream models that have been polished by industrial practice for a long time and models that have won championships in international competitions; it provides many scenarios for semantic understanding, image classification,...

Downloads: 0 This Week

Last Update: 2022-08-01

See Project

Bangla TTS

Bangla text to speech synthesis in python

Bangla text to speech Multilingual (Bangla, English) real-time ([almost] in a GPU) speech synthesis library. Installation -------------------------------------- * Install Anaconda * conda create -n new_virtual_env python==3.6.8 * conda activate new_virtual_env * pip install -r requirements.txt * While running for the first time, keep your internet connection on to download the weights of the speech synthesis models (>500 MB) * For...

Downloads: 0 This Week

Last Update: 2020-09-03

See Project

Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation

Tacotron-2 is a TensorFlow implementation of DeepMind’s Tacotron-2 end-to-end text-to-speech architecture, which predicts mel spectrograms from raw text and then feeds them to a neural vocoder such as WaveNet. It reproduces the original paper’s hyperparameters exactly via paper_hparams.py, while also offering a tuned hparams.py with extra improvements that often yield better audio quality in practice. The repository is structured as a full training pipeline: dataset preparation, preprocessing into spectrograms, Tacotron training, WaveNet (or Griffin-Lim) vocoder training, and final waveform synthesis. ...

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

DC-TTS

TensorFlow Implementation of DC-TTS: yet another text-to-speech model

...It follows the “Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention” paper, but the author adapts and extends the design to make it practical for real experiments. The model is split into two networks: Text2Mel, which maps text to mel-spectrograms, and SSRN (spectrogram super-resolution network), which converts low-resolution mel-spectrograms into high-resolution magnitude spectrograms suitable for waveform synthesis. Training scripts, data loaders, and hyperparameter configurations are provided to reproduce results on several datasets, including LJ Speech for English, a Korean single-speaker dataset, and audiobook data from Nick Offerman and Kate Winslet.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

Pythia

Pythia is a natural language question answering system, which uses Speech Recognition and Text To Speech technologies to communicate with the user.

Downloads: 0 This Week

Last Update: 2016-07-29

See Project

Search Results for "which" - Page 2

Showing 36 open source projects for "which"

NVIDIA NeMo Framework

Mice MX OS speech to text Voice Control

DiffSinger

VoiceFixer

TensorFlowTTS

VITS

PaddlePaddle models

Bangla TTS

Tacotron-2

DC-TTS

Pythia

Search Results for "which" - Page 2

Showing 36 open source projects for "which"

NVIDIA NeMo Framework

Mice MX OS speech to text Voice Control

DiffSinger

VoiceFixer

TensorFlowTTS

VITS

PaddlePaddle models

Bangla TTS

Tacotron-2

DC-TTS

Pythia

Related Searches

Related Categories