An open-source music player with simple UI
Instant voice cloning by MIT and MyShell. Audio foundation model
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Download videos from almost any website
SOTA discrete acoustic codec models with 40/75 tokens per second
Automated Music Discovery and Collection Manager
SOTA Open Source TTS
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
AI video generator optimized for low VRAM and older GPUs use
Download videos from websites like YouTube and many others
Comprehensive Gradio WebUI for audio processing
Oobabooga - The definitive Web UI for local AI, with powerful features
Multimodal Diffusion with Representation Alignment
The music player of today
The official Python SDK for the ElevenLabs API
AI tool converting video/audio into structured documents instantly
Capable of understanding text, audio, vision, video
Data manipulation and transformation for audio signal processing
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
Open source AI model for generating full songs from lyrics prompts
A general fine-tuning kit geared toward image/video/audio diffusion
A Web UI for easy subtitle using whisper model
Interface for OuteTTS models
Robust Speech Recognition via Large-Scale Weak Supervision
Unofficial Python API and agentic skill for Google NotebookLM