VALL-E X
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
...VALL-E-X supports zero-shot cross-lingual synthesis, meaning a monolingual speaker’s voice can be used to speak other languages without additional training. It also preserves aspects of the acoustic environment, such as background noise or reverb, making the generated audio feel more like it came from the same setting as the prompt. The repository includes Python APIs, sample scripts, ready-to-use voice presets, and demos hosted on Hugging Face Spaces and Google Colab so users can try it.