Multi-modal large language model designed for audio understanding
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Interface for OuteTTS models
Open-source multi-speaker long-form text-to-speech model
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Automatic Speech Recognition with Word-level Timestamps
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A Web UI for easy subtitle using whisper model
One-click deployment (including offline integration package)
Official PyTorch Implementation
An Open Source implementation of Notebook LM with more flexibility
Synchronized Translation for Videos
Instant voice cloning by MIT and MyShell. Audio foundation model
High-Quality Voice Cloning TTS for 600+ Languages
Translate the video from one language to another and embed dubbing
MARS5 speech model (TTS) from CAMB.AI
A Python library for audio
Award-Winning Open Source Video Editing Software
Speech recognition module for Python
Audiocraft is a library for audio processing and generation
Foundational model for human-like, expressive TTS
Mopidy is an extensible music server written in Python
The most powerful and modular diffusion model GUI, api and backend
Download videos from websites like YouTube and many others
GenAI Processors is a lightweight Python library