Industrial-level controllable zero-shot text-to-speech system
Foundational Models for State-of-the-Art Speech and Text Translation
A Conversational Speech Generation Model
VibeVoice: Open-source multi-speaker long-form text-to-speech model
CTC-based forced aligner for audio-text in 158 languages
Dia-1.6B generates lifelike English dialogue and vocal expressions