Scalable generative AI framework built for researchers and developers
Toolkit for conversational AI
End-to-end speech processing toolkit
Spark-TTS Inference Code
Long-form streaming TTS system for multi-speaker dialogue generation
A fast TTS architecture with conditional flow matching
SOTA discrete acoustic codec models with 40/75 tokens per second
Synchronized Translation for Videos
Unofficial Parallel WaveGAN
Singing Voice Synthesis via Shallow Diffusion Mechanism
WaveRNN Vocoder + TTS
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Conditional Variational Autoencoder with Adversarial Learning
Deep learning for text to speech