Taming Stable Diffusion for Lip Sync
SOTA discrete acoustic codec models with 40/75 tokens per second
A Python library for audio data augmentation
Generate audiobooks from e-books, voice cloning & 1107+ languages
One-click deployment (including offline integration package)
AudioMuse-AI is an Open Source Dockerized environment
A nearly-live implementation of OpenAI's Whisper
Clone a voice in 5 seconds to generate arbitrary speech in real-time
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Instant voice cloning by MIT and MyShell. Audio foundation model
Capable of understanding text, audio, vision, video
SOTA Open Source TTS
Free, high-quality text-to-speech API endpoint to replace OpenAI
Fast multimodal LLM for real-time voice interaction and AI apps
Oobabooga - The definitive Web UI for local AI, with powerful features
Automatically translates the text of a video based on a subtitle file
Offline Text To Speech synthesis for python
Sample code and notebooks for Generative AI on Google Cloud
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
ChatGPT interface with better UI
Implementation of AudioLM audio generation model in Pytorch
AI video generator optimized for low VRAM and older GPUs use
Unofficial Python API and agentic skill for Google NotebookLM
Robust Speech Recognition via Large-Scale Weak Supervision
Open source AI model for generating full songs from lyrics prompts