singing voice synthesizer free download

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder.

Downloads: 11 This Week

Last Update: 2026-03-09

See Project

Applio

A simple, high-quality voice conversion tool focused on ease of use

Applio is a high-quality voice conversion toolkit designed to make modern RVC/VITS-based voice cloning accessible to non-experts. It focuses strongly on ease of use: installation scripts for Windows, Linux, and macOS set up dependencies and then launch a browser-based Gradio interface. Within that interface, users can train and run voice conversion models for tasks like singing conversion, speech-to-speech transformation, and voice cloning.

Downloads: 93 This Week

Last Update: 2026-02-18

See Project

Qwen-Audio

Chat & pretrained large audio language model proposed by Alibaba Cloud

...It includes features such as flexible multi-run chat, audio understanding/reasoning, music appreciation, and also tool usage (e.g. voice editing).

Downloads: 2 This Week

Last Update: 2025-09-23

See Project

Step-Audio

Open-source framework for intelligent speech interaction

...The design moves beyond traditional separate-component pipelines (ASR → text model → TTS), instead offering a multimodal model that ingests speech or audio and produces speech accordingly, enabling natural dialogue, voice cloning, and expressive speech synthesis. Through its architecture, Step-Audio supports multilingual interaction, dialects, emotional tones (joy, sadness, etc.), and even more creative speech styles (like rap or singing), while allowing dynamic control over speech characteristics. It also provides a “generative data engine,” which can produce synthetic speech data (cloning voices, varying style) to support TTS training.

Downloads: 0 This Week

Last Update: 2026-03-16

See Project

SoftVC VITS Singing Voice Conversion

SoftVC VITS Singing Voice Conversion

SoftVC VITS Singing Voice Conversion is a deep learning project focused on singing voice conversion, allowing users to transform one voice into another while preserving melody and timing. Unlike traditional text-to-speech systems, it specializes specifically in singing scenarios and does not provide general TTS functionality. The project leverages neural network architectures derived from VITS and SoftVC research to achieve high-quality voice transformation. ...

Downloads: 5 This Week

Last Update: 2026-03-02

See Project

Parallel WaveGAN

Unofficial Parallel WaveGAN

...Its main goal is to provide a real-time neural vocoder that can turn mel spectrograms into high-quality speech audio efficiently. The repository is designed to work hand-in-hand with ESPnet-TTS and NVIDIA Tacotron2-style front ends, so you can build complete TTS or singing voice synthesis pipelines. It includes a large collection of “Kaldi-style” recipes for many datasets such as LJSpeech, LibriTTS, VCTK, JSUT, CMU Arctic, and multiple singing voice corpora in Japanese, Mandarin, Korean, and more. The project provides pre-trained models, Colab demos, and example configurations, allowing researchers to quickly evaluate vocoder quality or adapt models to new datasets.

Downloads: 0 This Week

Last Update: 2025-11-28

See Project

lora-svc

Singing voice change based on whisper, lora for singing voice clone

singing voice change based on whisper, and lora for singing voice clone. You will feel the beauty of the code from this project. Uni-SVC main branch is for singing voice clone based on whisper with speaker encoder and speaker adapter. Uni-SVC main target is to develop lora for SVC. With lora, maybe clone a singer just need 10 stence after 10 minutes train.

Downloads: 0 This Week

Last Update: 2023-06-12

See Project

Amiga Memories

A walk along memory lane

...The generator itself is implemented in Squirrel, the 3D rendering is done on GameStart 3D. An Amiga Memories video is mostly based on a narrative. The purpose of the script is to define the spoken and written content. The spoken text will be read by a voice synthesizer (Text To Speech or TTS), the written text is simply drawn on the image as subtitles. Here, in addition to the spoken & written narration, the script controls the camera movements as well as the LED activity of the computer. Amiga Memories' video images are computed by the GameStart 3D engine (pre-HARFANG 3D). Although the 3D assets are designed to be played back in real-time with a variable framerate, the engine is capable of breaking down the video sequence into the 30th or 60th of a second, as TGA files.

Downloads: 0 This Week

Last Update: 2023-03-22

See Project

DiffSinger

Singing Voice Synthesis via Shallow Diffusion Mechanism

DiffSinger is an open-source PyTorch implementation of a diffusion-based acoustic model for singing-voice synthesis (SVS) and also text-to-speech (TTS) in a related variant. The core idea is to view generation of a sung voice (mel-spectrogram) as a diffusion process: starting from noise, the model iteratively “denoises” while being conditioned on a music score (lyrics, pitch, musical timing). This avoids some of the typical problems of prior SVS models — like over-smoothing or unstable GAN training — and produces more realistic, expressive, and natural-sounding singing. ...

Downloads: 47 This Week

Last Update: 2025-11-28

See Project

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

...The codebase is implemented in Python (with PyTorch) and includes modules for encoder, synthesizer, vocoder, preprocessing, and inference, as well as demo scripts and a web-server interface for easier experimentation or deployment. MockingBird supports both using pretrained models and training your own synthesizer (with custom datasets), giving flexibility for voice-cloning or custom-voice synthesis depending on your needs.

1 Review

Downloads: 6 This Week

Last Update: 2023-03-23

See Project

Spleeter

Deezer source separation library including pretrained models

Spleeter is the Deezer source separation library with pretrained models written in Python and using Tensorflow. It makes it easy to train music source separation models (assuming you have a dataset of isolated sources), and provides already trained state of the art models for performing various flavours of separation. 2 stems and 4 stems models have state of the art performances on the musdb dataset. Spleeter is also very fast as it can perform separation of audio files to 4 stems 100x...

1 Review

Downloads: 77 This Week

Last Update: 2021-09-03

See Project

Search Results for "singing voice synthesizer"

Showing 11 open source projects for "singing voice synthesizer"

Real-Time Voice Cloning

Applio

Qwen-Audio

Step-Audio

SoftVC VITS Singing Voice Conversion

Parallel WaveGAN

lora-svc

Amiga Memories

DiffSinger

Mocking Bird

Spleeter

Search Results for "singing voice synthesizer"

Showing 11 open source projects for "singing voice synthesizer"

Real-Time Voice Cloning

Applio

Qwen-Audio

Step-Audio

SoftVC VITS Singing Voice Conversion

Parallel WaveGAN

lora-svc

Amiga Memories

DiffSinger

Mocking Bird

Spleeter

Related Searches

Related Categories