Instant voice cloning by MIT and MyShell. Audio foundation model
TTS model capable of streaming conversational audio in realtime
48khz stereo neural audio codec for general audio
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Extract audio and video content and organize it into a Markdown note
Music player and music library manager for Linux, Windows, and macOS
A lightning fast audio upsampler
A lightweight audio-to-MIDI converter with pitch bend detection
Cross platform GUI tool for downloading videos from Bilibili sites
MOSS-TTS-Nano is an open-source multilingual tiny speech generation
Speakr is a personal, self-hosted web application
SOTA discrete acoustic codec models with 40/75 tokens per second
Automated Music Discovery and Collection Manager
Generate audiobooks from e-books, voice cloning & 1107+ languages
Swing Music is a beautiful, self-hosted music player
Dumb downloader that scrapes the web
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
AudioMuse-AI is an Open Source Dockerized environment
Automatic subtitle synchronization tool
A nearly-live implementation of OpenAI's Whisper
Transforming Multimodal Content into Captivating Multilingual Audio
A speech-text foundation model for real time dialogue
Automatic Speech Recognition with Word-level Timestamps
Comprehensive Gradio WebUI for audio processing
Automagically synchronize subtitles with video