An open-source music player with simple UI
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
A Python library for audio data augmentation
Taming Stable Diffusion for Lip Sync
Qwen3-omni is a natively end-to-end, omni-modal LLM
Generate audiobooks from e-books, voice cloning & 1107+ languages
A nearly-live implementation of OpenAI's Whisper
48khz stereo neural audio codec for general audio
SOTA discrete acoustic codec models with 40/75 tokens per second
Multimodal Diffusion with Representation Alignment
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Implementation of AudioLM audio generation model in Pytorch
Video player for improving quality of hand-drawn images
Free, high-quality text-to-speech API endpoint to replace OpenAI
Capable of understanding text, audio, vision, video
Open-source multi-speaker long-form text-to-speech model
Trying to be a robust, user-friendly and hackable music player
Interface for OuteTTS models
Instant voice cloning by MIT and MyShell. Audio foundation model
Data manipulation and transformation for audio signal processing
A Systematic Framework for Interactive World Modeling
Comprehensive Gradio WebUI for audio processing
Robust Speech Recognition via Large-Scale Weak Supervision
The official Python SDK for the ElevenLabs API
Generate blog articles from video or audio