Best Open Source Voice Cloning Software 2026

Voice Cloning Software

Voice Cloning Clear Filters

Browse free open source Voice Cloning software and projects below. Use the toggles on the left to filter open source Voice Cloning software by OS, license, language, programming language, and project status.

Forever Free Full-Stack Observability | Grafana Cloud
Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.

Create free account
Go From AI Idea to AI App Fast
One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.

Try Free
1

GPT-SoVITS

1 min voice data can also be used to train a good TTS model

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Downloads: 70 This Week

Last Update: 2025-07-29
See Project
2

Lyrebird

Simple and powerful voice changer for Linux, written with Python & GTK

Simple and powerful voice changer for Linux, written with Python & GTK.

Downloads: 59 This Week

Last Update: 2024-06-27
See Project
3

Coqui TTS

A deep learning toolkit for Text-to-Speech, battle-tested in research

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks. Text2Spec models (Tacotron, Tacotron2, Glow-TTS, SpeedySpeech). Speaker Encoder to compute speaker embeddings efficiently. Vocoder models (MelGAN, Multiband-MelGAN, GAN-TTS, ParallelWaveGAN, WaveGrad, WaveRNN) Fast and efficient model training. Detailed training logs on the terminal and Tensorboard. Support for Multi-speaker TTS. Efficient, flexible, and lightweight but feature complete Trainer API. Released and ready-to-use models. Tools to curate Text2Speech datasets underdataset_analysis. Utilities to use and test your models.

Downloads: 55 This Week

Last Update: 2023-12-12
See Project
4

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak naturally in others. Architecturally, OpenVoice separates “tone color” cloning from style control, which makes it easier to keep a consistent identity while flexibly changing prosody or language. The project provides open-weight models, inference code, and examples, making it suitable both for research and for building production voice experiences. It is actively developed by MyShell, which also integrates OpenVoice into broader agent and entertainment workflows.

Downloads: 15 This Week

Last Update: 2025-11-28
See Project
Try Google Cloud Risk-Free With $300 in Credit
No hidden charges. No surprise bills. Cancel anytime.

Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.

Start Free
5

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that captures voice characteristics; this embedding is then used by a Tacotron-style synthesizer to generate spectrograms from text, which a WaveRNN-based vocoder finally turns into audio. The repo includes both a command-line demo and a graphical “toolbox” application where you can load reference voices, type text, and hear the synthesized results interactively. It also provides scripts for preprocessing datasets (such as LibriSpeech), training each of the three components.

Downloads: 11 This Week

Last Update: 2026-03-09
See Project
6

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English. The codebase is implemented in Python (with PyTorch) and includes modules for encoder, synthesizer, vocoder, preprocessing, and inference, as well as demo scripts and a web-server interface for easier experimentation or deployment. MockingBird supports both using pretrained models and training your own synthesizer (with custom datasets), giving flexibility for voice-cloning or custom-voice synthesis depending on your needs.

1 Review

Downloads: 6 This Week

Last Update: 2023-03-23
See Project
7

elevenlabs-api

elevenlabs-api is an open source Java wrapper around the ElevenLabs

Elevenlabs-api is an open-source Java wrapper around the ElevenLabs Voice Synthesis and Cloning Web API. Compiled JARs are available via the Releases tab. To access your ElevenLabs API key, head to the official website, you can view your xi-API-key using the 'Profile' tab on the website. To set up your ElevenLabs API key, you must register it with the ElevenLabsAPI Java API. For any public repository security, you should store your API key in an environment variable, or external from your source code. The most realistic and versatile AI speech software, ever. Eleven brings the most compelling, rich and lifelike voices to creators and publishers seeking the ultimate tools for storytelling. Generate top-quality spoken audio in any voice and style with the most advanced and multipurpose AI speech tool out there. Our deep learning model renders human intonation and inflections with unprecedented fidelity and adjusts delivery based on context.

Downloads: 6 This Week

Last Update: 2023-12-25
See Project
8

Parakeet

PAddle PARAllel text-to-speech toolKIT

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models. In order to facilitate exploiting the existing TTS models directly and developing the new ones, Parakeet selects typical models and provides their reference implementations in PaddlePaddle. Further more, Parakeet abstracts the TTS pipeline and standardizes the procedure of data preprocessing, common module sharing, model configuration, and the process of training and synthesis. The models supported here include Text FrontEnd, end-to-end Acoustic models and Vocoders.

Downloads: 2 This Week

Last Update: 2023-03-24
See Project
9

Multilingual Speech Synthesis

An implementation of Tacotron 2 that supports multilingual experiments

This repository provides synthesized samples, training and evaluation data, source code, and parameters for the paper One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech. It contains an implementation of Tacotron 2 that supports multilingual experiments and that implements different approaches to encoder parameter sharing. It presents a model combining ideas from Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning, End-to-End Code-Switched TTS with Mix of Monolingual Recordings, and Contextual Parameter Generation for Universal Neural Machine Translation. We provide data for comparison of three multilingual text-to-speech models. The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.

Downloads: 0 This Week

Last Update: 2023-03-24
See Project
MongoDB Atlas runs apps anywhere
Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.

Start Free
10

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model

PaddleSpeech is an open-source toolkit on PaddlePaddle platform for a variety of critical tasks in speech and audio, with state-of-art and influential models. Via the easy-to-use, efficient, flexible and scalable implementation, our vision is to empower both industrial application and academic research, including training, inference & testing modules, and deployment process. Low barriers to install, CLI, Server, and Streaming Server is available to quick-start your journey. We provide high-speed and ultra-lightweight models, and also cutting-edge technology. We provide production ready streaming asr and streaming tts system. Our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.

Downloads: 0 This Week

Last Update: 2025-03-04
See Project
11

Voice Cloning App

A Python/Pytorch app for easily synthesising human voices

A Python/Pytorch app for easily synthesizing human voices. If you are using a language other than English you can add it to the app. Firstly, you'll need to find a deep speech model for your language by going to coqui. You'll then need to download the model.pbmm and alphabet.txt files for your language. Requires Windows 10 or Ubuntu 20.04+ operating system, 5GB+ Disk space, and NVIDIA GPU with at least 4GB of memory & driver version 456.38+ (optional). Automatic dataset generation (with support for subtitles and audiobooks) Additional language support. Local & remote training. Easy train start/stop. Data importing/exporting.

Downloads: 0 This Week

Last Update: 2023-03-24
See Project
12

Voice-Cloning-App

A Python/Pytorch app for easily synthesising human voices

Downloads: 0 This Week

Last Update: 2021-08-13
See Project
13

VoiceOver

VoiceOver is a web application that allows you to transcribe audio

VoiceOver is a web application that allows you to transcribe English audio and listen to it in another voice. Choose a source, an audio file (.wav) in English only. Transcribe audio, several algorithms will take care of it. Listen to the generated transcription, a man or a woman, it's up to you!

1 Review

Downloads: 0 This Week

Last Update: 2023-03-24
See Project
14

VoiceSmith

[WIP] VoiceSmith makes training text to speech models easy

VoiceSmith makes it possible to train and infer on both single and multispeaker models without any coding experience. It fine-tunes a pretty solid text to speech pipeline based on a modified version of DelightfulTTS and UnivNet on your dataset. Both models were pretrained on a proprietary 5000 speaker dataset. It also provides some tools for dataset preprocessing like automatic text normalization. Windows (only CPU supported currently) or any Linux based operating system. If you want to run this on macOS you have to follow the steps in build from source in order to create the installer. This is untested since I don't currently own a Mac. NVIDIA GPU with CUDA support is highly recommended, you can train on CPU otherwise but it will take days if not weeks. VoiceSmith currently uses a two-stage modified DelightfulTTS and UnivNet pipeline.

Downloads: 0 This Week

Last Update: 2023-03-24
See Project
15

lora-svc

Singing voice change based on whisper, lora for singing voice clone

singing voice change based on whisper, and lora for singing voice clone. You will feel the beauty of the code from this project. Uni-SVC main branch is for singing voice clone based on whisper with speaker encoder and speaker adapter. Uni-SVC main target is to develop lora for SVC. With lora, maybe clone a singer just need 10 stence after 10 minutes train. Each singer is a plug-in of the base model.

Downloads: 0 This Week

Last Update: 2023-06-12
See Project
16

vocoder_chung

vocoder chung is a small educational vocoder using discrete fourier transform FFT spectrum written in easy fast compiled freebasic . (24/12/2019) uses fast and accurate FFTdll.dll (28/03/2020) algorythmic voice cloning / change / morphing experiment added

Downloads: 0 This Week

Last Update: 2020-06-03
See Project

Previous
You're on page 1
Next

Guide to Open Source Voice Cloning Software

Open source voice cloning software is a type of technology that allows users to manipulate someone’s voice and alter it to sound like themselves. This type of software has multiple uses, from creating personalized audio experiences for video games to assisting with speech-to-text applications. Open source voice cloning software can also be used for speech synthesizing, lip sync dubbing, virtual reality avatars, as well as many other applications.

One popular program utilized by open source developers is the Multispeech Speech Synthesis System (MSSS). MSSS provides components such as an acoustic model, text processor, pronunciation dictionary and a parameter set which allow developers to quickly produce high quality recordings. It also has built in tools such as audio manipulation functions which allow users to further control their recordings. Other programs include TTS Engine Builder and Festival Speech Synthesis System which provide powerful features for building custom voicesets and providing support for various languages.

Open source voice cloning software is becoming increasingly popular due its versatile nature and ease of use by developers. With its ability to produce customizable voicesets suited for any kind of application or purpose, there are endless possibilities for what can be created with this powerful toolset. It is also important to note that many commercial applications utilize open source code when possible; often times companies will choose these freely available resources over expensive licensed technologies due their cost effectiveness and wide range of capabilities they offer users.

Features Offered by Open Source Voice Cloning Software

Text-to-Speech (TTS) Conversion: Open source voice cloning software offers the ability to convert written text into audio. This process is usually handled by an artificial neural network that understands how words are spoken in different contexts and then synthesizes them. The quality of the output depends on the accuracy of the algorithm used.
Speech Recognition: Open source voice cloning software can recognize speech from a variety of sources, including microphone recordings, recordings from telephones, and files in various formats (such as MP3). It can also be used to create transcripts of conversations or lectures for further analysis.
Voice Synthesis: This feature allows users to manipulate existing recordings or combine elements from multiple sources together in order to create new vocal performances. For example, users can take snippets from a singer's performance and add background music or effects in order to create an entirely distinctive sound.
Unit Selection Synthesis: This feature enables open source voice cloning software to generate natural-sounding voices using preselected units taken from a database of recorded speeches that have been accumulated over time through crowd sourcing efforts or manual labor such as digitizing old radio broadcasts.
Deep Learning Based Models: Advanced open source voice cloning software uses deep learning models such as convolutional neural networks with recurrent layers (CNN+RNNs) that are trained with large datasets containing thousands of utterances in order to generate better results than those obtained using unit selection synthesis alone. By modeling both fundamental frequencies and spectral features alongside linguistic structures, these models give more realistic outputs than other methods while still reducing computational costs significantly compared to traditional speech synthesis techniques.

What Are the Different Types of Open Source Voice Cloning Software?

Text-To-Speech (TTS): TTS is a type of open source voice cloning software that takes written text as an input and converts it into speech. It is commonly used for applications such as creating audio books, used in digital assistants like Siri or Alexa, automated customer service systems, etc.
Speech Synthesis Markup Language (SSML): SSML is a markup language for describing synthesized speech for computer generated voices. It allows developers to customize the vocal characteristics of the outputted audio by manipulating parameters such as pitch, rate, volume etc.
Voice Conversion:This type of voice cloning software can take one person's voice and turn it into another person's while preserving the same characteristics. It can be useful when trying to generate similar sounding audio from different speakers with minimal effort.
Voice Cloning:Voice cloning involves taking recordings of a user’s speech and then generating new synthetic voices that are similar to the original speaker’s voice. This can be useful applications in virtual assistants as well as providing audible customizations such as accents or languages for certain products or services.
Speaker Recognition/Verification: This type of open source software specializes in using machine learning algorithms to recognize a person's speaking style and analyze it against previously recorded audio clips. This method can be used for automated verification processes such as security checks on phone calls or logins into banking systems which require personal identification numbers (PINs) entered out loud over the phone.

Benefits Provided by Open Source Voice Cloning Software

Cost-Effectiveness: By being open source, users can download the software and use it at no cost. This makes it ideal for those with smaller budgets who still want access to effective voice cloning technology.
Customization Options: Experts in coding can easily work with open source software, which allows for a wide range of customization options. With this flexibility, users are able to adjust the programs settings to best suit their individual needs.
Advanced Features and Capabilities: Open source voice cloning software is often ahead of its proprietary counterparts when it comes to features and capabilities. This makes them great options for more advanced users who may need something that’s a bit more sophisticated than what’s typically available on the market.
Reusability: Once an open source program has been developed, it can be reused by anyone without having to worry about copyright infringement or paying additional fees associated with proprietary solutions.
Improved Security and Quality Standards: Open source solutions tend to have higher security standards and improved quality control compared to closed solution alternatives, as they undergo extensive review by developers before release. Additionally, due to the fact that they are constantly updated and reviewed by experts on an ongoing basis, vulnerabilities are addressed quickly - meaning less downtime when bugs arise or changes need to be made.

What Types of Users Use Open Source Voice Cloning Software?

Creative Professionals: These are software developers, animators, sound engineers and other individuals who use open source voice cloning software to create or enhance their works. They can apply it to films, video games and other multimedia applications.
Researchers: These are scientific professionals who use open source voice cloning technology to study the properties of human speech. It is used in medical research, linguistics and more.
Educators: These include teachers at universities and colleges who may incorporate open source voice cloning into their classes to teach students about artificial intelligence (AI) systems or howprograms process audio signals.
Home Users: Anyone with a microphone and a computer can access this technology for personal use in creating podcasts, videos or other interesting projects.
Businesses: Many businesses are now utilizing open source voice cloning software to develop interactive customer service solutions such as automated phone operators or virtual assistants.

How Much Does Open Source Voice Cloning Software Cost?

Open source voice cloning software is free to use, so there is no cost associated with using it. However, depending on the type of open source software you choose to use, there may be other costs involved. For instance, if you need to purchase additional hardware such as microphones or audio interfaces in order to use your chosen software effectively, that could add up over time. Additionally, if you are wanting more than basic voice cloning capabilities and need access to advanced features like text-to-speech or speech recognition, then there will likely be a premium version of the same software available for purchase that includes these features. Lastly, if you are looking for dedicated support from the developers who created the open source software (e.g., technical assistance with installation and usage), then this could incur additional fees based on their terms and conditions. All in all though, open source voice cloning technology remains an affordable solution compared to more traditional methods of creating artificial voices.

What Software Does Open Source Voice Cloning Software Integrate With?

Open source voice cloning software can integrate with a variety of different types of software. It is most commonly used in conjunction with digital audio workstations, which allow users to edit and create audio. Text-to-speech applications are also often connected to open source voice cloning software, so that text input can be converted into speech output. Video editing programs such as Adobe Premiere Pro or Final Cut Pro may also be used in combination with these systems for the purpose of creating lip sync animations. Additionally, some machine learning frameworks may be integrated for tasks such as natural language understanding and automatic speech recognition (ASR). All of this software serves to supplement the capabilities of the open source voice cloning platform and provides users with a comprehensive suite of tools for producing realistic synthesized voices.

Recent Trends Related to Open Source Voice Cloning Software

Open source voice cloning software is becoming increasingly popular, as it provides a cost-effective way to generate realistic synthetic voices.
The use of open source voice cloning software has grown exponentially in recent years due to advances in artificial intelligence (AI) technology and the falling cost of data storage and computing power.
Many organizations are turning to open source voice cloning software for their speech synthesis needs, as it offers greater flexibility than proprietary solutions.
Open source voice cloning software can be used for various applications, such as creating speech synthesis systems for virtual assistants, robots, or video games.
Open source voice cloning software is also being used to create digital avatars that can speak with realistic voices and can be used for virtual meetings or remote customer service.
Open source voice cloning software can also be used to create custom voices that can be used to generate audio recordings for marketing purposes or for voiceovers in videos.
As the technology continues to evolve and new applications are developed, the use of open source voice cloning software is expected to continue to grow.

How Users Can Get Started With Open Source Voice Cloning Software

Getting started with open source voice cloning software is a straightforward process that is relatively easy to follow.

First, create an account on a platform or website that has the open source software available for download. Many platforms also have tutorials and sample projects to help users learn how to use the software. Download the files from the platform onto your computer, being sure to choose the latest version. Once it's downloaded, unzip the file and place it in a location on your computer so you can easily find it later.

Next, set up any necessary dependencies, such as Python and neural networks libraries like Tensorflow or PyTorch. If you need extra guidance following this step, many websites offer detailed instructions on how to install all of these components correctly.

Once everything is properly installed, you can start training your model using data sets containing audio recordings of speech and text transcripts of what was said in each recording. You should make sure that these recordings are clear and from different speakers who each produce distinct vocal characteristics since this will help you achieve better results when training your model.

Finally, once your data set is prepared and loaded into the system properly, run an algorithm over it so that your software can begin learning how voices sound for itself. This process may take several hours depending on size of data set being used but can be sped up by running multiple processors simultaneously or utilizing cloud computing services if needed.

By following these steps closely, users should be able to get started using open source voice cloning software quickly and effectively.

Open Source Voice Cloning Software

Voice Cloning Software

GPT-SoVITS

Lyrebird

Coqui TTS

OpenVoice

Real-Time Voice Cloning

Mocking Bird

elevenlabs-api

Parakeet

Multilingual Speech Synthesis

PaddleSpeech

Voice Cloning App

Voice-Cloning-App

VoiceOver

VoiceSmith

lora-svc

vocoder_chung