ⓍTTS-v2
Multilingual voice cloning TTS model with 6-second sample support
ⓍTTS-v2 (XTTS-v2) by Coqui is a powerful multilingual text-to-speech model capable of cloning voices from a short 6-second audio sample. It supports 17 languages and enables high-quality voice generation with emotion, style transfer, and cross-language synthesis. The model introduces major improvements over ⓍTTS-v1, including better prosody, stability, and support for Hungarian and Korean. ⓍTTS-v2 allows interpolation between multiple voice references and generates speech at a 24kHz sampling...