Dia
A TTS model capable of generating ultra-realistic dialogue
...Instead of focusing on isolated sentences or flat narration, it is optimized for conversational audio, complete with natural turn-taking, prosody, and pacing. The model can be conditioned on a reference audio sample, allowing you to control emotion, tone, and other stylistic aspects of the speech. It can also produce nonverbal vocalizations like laughter, coughs, clearing the throat, and similar sounds, which are crucial for making synthetic conversations feel human. Dia is released with pretrained checkpoints and inference code, with weights hosted on Hugging Face, so researchers and developers can quickly try it or integrate it into pipelines. ...