speech text free download

OpenVoice

Instant voice cloning by MIT and MyShell. Audio foundation model

OpenVoice is a versatile instant voice cloning system that can replicate a speaker’s tone color from just a short audio clip and then generate speech in multiple languages. It is designed not only to match the timbre of the reference voice, but also to give granular control over style parameters such as emotion, accent, rhythm, pauses, and intonation. The model supports cross-lingual and even zero-shot cross-lingual voice cloning, so a speaker recorded in one language can be made to speak...

Downloads: 23 This Week

Last Update: 2025-11-28

See Project

GPT-SoVITS

1 min voice data can also be used to train a good TTS model

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Downloads: 35 This Week

Last Update: 2025-07-29

See Project

Real-Time Voice Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Real-Time Voice Cloning is an influential deep-learning repository that demonstrates how to clone a voice from just a few seconds of audio and then generate arbitrary speech in that voice in near real time. It implements the SV2TTS pipeline (“Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis”) in three stages: a speaker encoder, a synthesizer, and a vocoder. In the first stage, short audio clips are converted into a fixed-dimensional speaker embedding that captures voice characteristics; this embedding is then used by a Tacotron-style synthesizer to generate spectrograms from text, which a WaveRNN-based vocoder finally turns into audio. ...

Downloads: 2 This Week

Last Update: 2026-03-09

See Project

PaddleSpeech

Easy-to-use Speech Toolkit including Self-Supervised Learning model

...We provide high-speed and ultra-lightweight models, and also cutting-edge technology. We provide production ready streaming asr and streaming tts system. Our frontend contains Text Normalization and Grapheme-to-Phoneme (G2P, including Polyphone and Tone Sandhi). Moreover, we use self-defined linguistic rules to adapt Chinese context.

Downloads: 0 This Week

Last Update: 2025-03-04

See Project

Coqui TTS

A deep learning toolkit for Text-to-Speech, battle-tested in research

TTS is a library for advanced Text-to-Speech generation. It's built on the latest research, was designed to achieve the best trade-off among ease-of-training, speed and quality. TTS comes with pre-trained models, tools for measuring dataset quality and is already used in 20+ languages for products and research projects. High-performance Deep Learning models for Text2Speech tasks.

Downloads: 20 This Week

Last Update: 2023-12-12

See Project

VoiceSmith

[WIP] VoiceSmith makes training text to speech models easy

VoiceSmith makes it possible to train and infer on both single and multispeaker models without any coding experience. It fine-tunes a pretty solid text to speech pipeline based on a modified version of DelightfulTTS and UnivNet on your dataset. Both models were pretrained on a proprietary 5000 speaker dataset. It also provides some tools for dataset preprocessing like automatic text normalization. Windows (only CPU supported currently) or any Linux based operating system. If you want to run this on macOS you have to follow the steps in build from source in order to create the installer. ...

Downloads: 0 This Week

Last Update: 2023-03-24

See Project

Mocking Bird

Clone a voice in 5 seconds to generate arbitrary speech in real-time

MockingBird is an open-source voice cloning and real-time speech generation toolkit that lets you clone a speaker’s voice from a short audio sample (reportedly as little as 5 seconds) and then synthesize arbitrary speech in that voice. It builds on deep-learning based TTS / voice-cloning technology (in the lineage of projects such as Real-Time-Voice-Cloning), but extends it with support for Mandarin Chinese and multiple Chinese speech datasets — broadening its applicability beyond English....

1 Review

Downloads: 0 This Week

Last Update: 2023-03-23

See Project

Parakeet

PAddle PARAllel text-to-speech toolKIT

PAddle PARAllel text-to-speech toolKIT (supporting Tacotron2, Transformer TTS, FastSpeech2/FastPitch, SpeedySpeech, WaveFlow and Parallel WaveGAN) Parakeet aims to provide a flexible, efficient and state-of-the-art text-to-speech toolkit for the open-source community. It is built on PaddlePaddle dynamic graph and includes many influential TTS models.

Downloads: 3 This Week

Last Update: 2023-03-24

See Project

Multilingual Speech Synthesis

An implementation of Tacotron 2 that supports multilingual experiments

...We provide data for comparison of three multilingual text-to-speech models. The first shares the whole encoder and uses an adversarial classifier to remove speaker-dependent information from the encoder. The second has separate encoders for each language.

Downloads: 0 This Week

Last Update: 2023-03-24

See Project

Search Results for "speech text"

Showing 9 open source projects for "speech text"

OpenVoice

GPT-SoVITS

Real-Time Voice Cloning

PaddleSpeech

Coqui TTS

VoiceSmith

Mocking Bird

Parakeet

Multilingual Speech Synthesis

Search Results for "speech text"

Showing 9 open source projects for "speech text"

OpenVoice

GPT-SoVITS

Real-Time Voice Cloning

PaddleSpeech

Coqui TTS

VoiceSmith

Mocking Bird

Parakeet

Multilingual Speech Synthesis

Related Searches

Related Categories