A text-to-speech, speech-to-text and speech-to-speech library
Chat & pretrained large audio language model proposed by Alibaba Cloud
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
GUI for a Vocal Remover that uses Deep Neural Networks
A Python library for audio
A Family of Open Sourced Music Foundation Models
Audiocraft is a library for audio processing and generation
Generate audiobooks from EPUBs, PDFs and text with captions
Code for openai.fm, a demo for the OpenAI Speech API
Synchronized Translation for Videos
A library for audio and music analysis, feature extraction
Qwen3-omni is a natively end-to-end, omni-modal LLM
A lightweight audio-to-MIDI converter with pitch bend detection
Transcribe any audio to text, translate and edit subtitles 100% locall
A Python library for audio data augmentation
A nearly-live implementation of OpenAI's Whisper
Taming Stable Diffusion for Lip Sync
Toolkit for audio, music, and speech generation
Free, high-quality text-to-speech API endpoint to replace OpenAI
48khz stereo neural audio codec for general audio
SOTA discrete acoustic codec models with 40/75 tokens per second
Captcha solver extension for humans
Generate audiobooks from e-books, voice cloning & 1107+ languages
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model