A fast TTS architecture with conditional flow matching
...Users can train on standard datasets like LJSpeech or plug in their own corpora, with helper tools for computing dataset statistics, extracting phoneme durations, and running multi-GPU training.