XTTS-v2

XTTS-v2 is an open-source multilingual text-to-speech and voice cloning model developed by Coqui. It enables zero-shot voice cloning using as little as six seconds of reference audio, allowing users to generate speech that closely matches a target speaker without additional training. The model supports 17 languages and can perform cross-language voice cloning, meaning a voice recorded in one language can be used to synthesize speech in another while preserving speaker identity. XTTS-v2 improves on the original XTTS architecture with better speaker conditioning, support for multiple reference clips, improved prosody, enhanced audio quality, and greater inference stability. The model generates speech at a 24 kHz sampling rate and supports emotion and style transfer through voice cloning. It can be used entirely offline, supports both inference and fine-tuning, and is widely adopted for AI assistants, content creation, dubbing, accessibility tools, and multilingual voice applications.

Features

Zero-shot voice cloning from as little as 6 seconds of audio
Supports 17 languages including English and Spanish
Cross-language voice cloning while preserving speaker identity
Emotion and speaking style transfer through voice cloning
24 kHz audio generation for high-quality speech output
Supports multiple speaker reference clips and interpolation
Can run locally without requiring cloud APIs
Supports both inference and fine-tuning workflows

Project Samples

Project Activity

See All Activity >

Follow XTTS-v2

XTTS-v2 Web Site

Other Useful Business Software

Build Agents and Models on One Platform

Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.

Try It Free

Rate This Project

User Reviews

Be the first to post a review of XTTS-v2!

Additional Project Details

Registered

2026-06-08

Similar Business Software

Voxtral TTS

Voxtral TTS is a state-of-the-art, multilingual text-to-speech model designed to generate highly realistic and emotionally expressive speech from text, combining strong contextual understanding with advanced speaker modeling to produce natural, human-like audio output. Built as a lightweight...

See Software
Chatterbox

Chatterbox is a free, open source voice cloning AI model developed by Resemble AI, licensed under MIT. It enables zero-shot voice cloning using just 5 seconds of reference audio, eliminating the need for training. The model offers expressive speech synthesis with unique emotion control, allowing...

See Software
Inworld TTS

Inworld TTS is a state-of-the-art text-to-speech platform designed to deliver ultra-realistic, context-aware speech synthesis and precise voice-cloning capabilities at a radically accessible price. The flagship model, TTS-1, is optimized for real-time applications and supports low-latency...

See Software
MiniMax Audio

MiniMax Audio is an AI-driven audio generation platform that transforms text into realistic speech across 50+ languages, offering over 300 expressive voices, including regional accents like American, Cantonese, Dutch, German, Czech, Japanese, and more, while supporting advanced features such as...

See Software
Qwen3-TTS

Qwen3-TTS is an open source series of advanced text-to-speech models developed by the Qwen team at Alibaba Cloud under the Apache-2.0 license, offering stable, expressive, and real-time speech generation with features such as voice cloning, voice design, and fine-grained control of prosody and...

See Software
Fish Audio

Fish Audio provides innovative AI-powered solutions for text-to-speech (TTS), voice cloning, and speech-to-text (STT) technologies. The platform is designed for businesses and developers looking to integrate high-quality, realistic voice synthesis into their applications. Fish Audio offers voice...

See Software