ⓍTTS-v2

ⓍTTS-v2 (XTTS-v2) by Coqui is a powerful multilingual text-to-speech model capable of cloning voices from a short 6-second audio sample. It supports 17 languages and enables high-quality voice generation with emotion, style transfer, and cross-language synthesis. The model introduces major improvements over ⓍTTS-v1, including better prosody, stability, and support for Hungarian and Korean. ⓍTTS-v2 allows interpolation between multiple voice references and generates speech at a 24kHz sampling rate. It's ideal for both inference and fine-tuning, with APIs and command-line tools available. The model powers Coqui Studio and the Coqui API, and can be run locally using Python or through Hugging Face Spaces. Licensed under the Coqui Public Model License, it balances open access with responsible use of generative voice technology.

Features

Voice cloning from a 6-second audio clip
Supports 17 languages including Arabic, Chinese, Hindi, and Japanese
Emotion and style transfer capabilities
Cross-lingual voice generation with speaker consistency
24kHz audio output for high sound quality
Improved prosody and speaker conditioning over v1
Fine-tuning and interpolation between multiple voice samples
Compatible with Python, CLI, and Hugging Face Inference Spaces

Project Samples

Project Activity

See All Activity >

Follow ⓍTTS-v2

ⓍTTS-v2 Web Site

Other Useful Business Software

Build Securely on AWS with Proven Frameworks

Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.

Download Now

Rate This Project

User Reviews

Be the first to post a review of ⓍTTS-v2!

Additional Project Details

Registered

2025-06-27

Similar Business Software

Chirp 3

Google Cloud's Text-to-Speech API introduces Chirp 3, enabling users to create personalized voice models using their own high-quality audio recordings. This feature facilitates the rapid generation of custom voices, which can be utilized to synthesize audio through the Cloud Text-to-Speech API,...

See Software
Chatterbox

Chatterbox is a free, open source voice cloning AI model developed by Resemble AI, licensed under MIT. It enables zero-shot voice cloning using just 5 seconds of reference audio, eliminating the need for training. The model offers expressive speech synthesis with unique emotion control, allowing...

See Software
Kokoro TTS

Kokoro TTS is an efficient text-to-speech tool with multilingual and customizable voice support. Its 182M parameter architecture delivers high-quality audio, supporting languages like American English, British English, French, Korean, Japanese, and Mandarin. It features lifelike voice options,...

See Software