Open-source multi-speaker long-form text-to-speech model
Capable of understanding text, audio, vision, video
Interface for OuteTTS models
Comprehensive Gradio WebUI for audio processing
A Systematic Framework for Interactive World Modeling
Data manipulation and transformation for audio signal processing
Instant voice cloning by MIT and MyShell. Audio foundation model
Robust Speech Recognition via Large-Scale Weak Supervision
SOTA Open Source TTS
Generate blog articles from video or audio
Multimodal-Driven Architecture for Customized Video Generation
The most powerful and modular diffusion model GUI, api and backend
MARS5 speech model (TTS) from CAMB.AI
A high-quality rapid TTS voice cloning model
Towards Human-Sounding Speech
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Use Microsoft Edge's online text-to-speech service from Python
Oobabooga - The definitive Web UI for local AI, with powerful features
A sound cloning tool with a web interface, using your voice
The official Python SDK for the ElevenLabs API
Sample code and notebooks for Generative AI on Google Cloud
High-quality multi-lingual text-to-speech library by MyShell.ai
One-click deployment (including offline integration package)
A TTS model capable of generating ultra-realistic dialogue
Automatically translates the text of a video based on a subtitle file