Long-form streaming TTS system for multi-speaker dialogue generation
Open-source multi-speaker long-form text-to-speech model
super expressive prompting model based on ltx2.3
MOSS‑TTS Family open‑source speech and sound generation model
Multi-modal large language model designed for audio understanding
Robust Speech Recognition Across Languages, Dialects
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Dia-1.6B generates lifelike English dialogue and vocal expressions