Multi-modal large language model designed for audio understanding
AI tool converting video/audio into structured documents instantly
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Industrial-level controllable zero-shot text-to-speech system
Synchronized Translation for Videos
Fast multimodal LLM for real-time voice interaction and AI apps
Taming Stable Diffusion for Lip Sync
A python tool that uses GPT-4, FFmpeg, and OpenCV
A simple native web interface that uses ChatTTS to synthesize text
A Telegram bot that integrates with OpenAI's official ChatGPT APIs
Towards Human-Level Text-to-Speech through Style Diffusion
A Conversational Speech Generation Model
A deep learning toolkit for Text-to-Speech, battle-tested in research
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Just Another Speech Recognition and Text to Speech software.