A lightweight audio-to-MIDI converter with pitch bend detection
48khz stereo neural audio codec for general audio
Implementation of AudioLM audio generation model in Pytorch
Dumb downloader that scrapes the web
Open-source multi-speaker long-form text-to-speech model
Swing Music is a beautiful, self-hosted music player
PersonaPlex code
Multilingual speech recognition and audio understanding model
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Extract audio and video content and organize it into a Markdown note
A speech-text foundation model for real time dialogue
Generate audiobooks from e-books, voice cloning & 1107+ languages
Fast multimodal LLM for real-time voice interaction and AI apps
A nearly-live implementation of OpenAI's Whisper
Musician-oriented Linux distro
Cross platform GUI tool for downloading videos from Bilibili sites
Qwen3-omni is a natively end-to-end, omni-modal LLM
Taming Stable Diffusion for Lip Sync
Synchronized Translation for Videos
One-click deployment (including offline integration package)
AudioMuse-AI is an Open Source Dockerized environment
Speakr is a personal, self-hosted web application
Streaming Real-time Audio-Driven Avatar Generation
Trying to be a robust, user-friendly and hackable music player
Clone a voice in 5 seconds to generate arbitrary speech in real-time