GPT-SoVITS

GPT‑SoVITS is a state-of-the-art voice conversion and TTS system that enables zero‑shot and few‑shot synthesis based on a short vocal sample (e.g., 5 seconds). It supports cross‑lingual speech synthesis across English, Chinese, Japanese, Korean, Cantonese, and more. It's powered by VITS architecture enhanced for few‑sample adaptation and real‑time usability.

Features

Zero‑shot TTS: generate speech from a 5‑second voice sample
Few‑shot fine-tuning: 1 minute of data for improved voice likeness
Cross-lingual support across multiple languages
Web UI for inference and batch generation
Open-source with pretrained model weights
Active community and publication‑grade performance

Project Samples

Project Activity

See All Activity >

License

MIT License

Follow GPT-SoVITS

GPT-SoVITS Web Site

Other Useful Business Software

Cut Data Warehouse Costs by 54%

Easily migrate from Snowflake, Redshift, or Databricks with free tools.

BigQuery delivers 54% lower TCO with exabyte scale and flexible pricing. Free migration tools handle the SQL translation automatically.

Try Free

Rate This Project

User Reviews

Be the first to post a review of GPT-SoVITS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Voice Cloning Software, Python Text-to-Speech (TTS) Models

Registered

2025-07-29

Similar Business Software

Qwen3-TTS

Qwen3-TTS is an open source series of advanced text-to-speech models developed by the Qwen team at Alibaba Cloud under the Apache-2.0 license, offering stable, expressive, and real-time speech generation with features such as voice cloning, voice design, and fine-grained control of prosody and...

See Software
Inworld TTS

Inworld TTS is a state-of-the-art text-to-speech platform designed to deliver ultra-realistic, context-aware speech synthesis and precise voice-cloning capabilities at a radically accessible price. The flagship model, TTS-1, is optimized for real-time applications and supports low-latency...

See Software
Voxtral TTS

Voxtral TTS is a state-of-the-art, multilingual text-to-speech model designed to generate highly realistic and emotionally expressive speech from text, combining strong contextual understanding with advanced speaker modeling to produce natural, human-like audio output. Built as a lightweight...

See Software
Voicv

Voicv is a cutting-edge voice cloning platform that transforms your voice into a digital asset in minutes, supporting multiple languages and zero-shot learning. It allows users to clone any voice with just a 10-30-second audio sample, maintaining high fidelity and natural expression. It...

See Software
Piper TTS

Piper is a fast, local neural text-to-speech (TTS) system optimized for devices like the Raspberry Pi 4, designed to deliver high-quality speech synthesis without relying on cloud services. It utilizes neural network models trained with VITS and exported to ONNX Runtime, enabling efficient and...

See Software
Chatterbox

Chatterbox is a free, open source voice cloning AI model developed by Resemble AI, licensed under MIT. It enables zero-shot voice cloning using just 5 seconds of reference audio, eliminating the need for training. The model offers expressive speech synthesis with unique emotion control, allowing...

See Software

Report inappropriate content

GPT-SoVITS

1 min voice data can also be used to train a good TTS model

Get an email when there's a new version of GPT-SoVITS

Features

Project Samples

Project Activity

Categories

License

Follow GPT-SoVITS

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered