A text-to-speech, speech-to-text and speech-to-speech library
Chat & pretrained large audio language model proposed by Alibaba Cloud
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
GUI for a Vocal Remover that uses Deep Neural Networks
A Python library for audio
A Family of Open Sourced Music Foundation Models
Audiocraft is a library for audio processing and generation
Speech recognition module for Python
Generate audiobooks from EPUBs, PDFs and text with captions
Code for openai.fm, a demo for the OpenAI Speech API
A library for audio and music analysis, feature extraction
Synchronized Translation for Videos
Qwen3-omni is a natively end-to-end, omni-modal LLM
A lightweight audio-to-MIDI converter with pitch bend detection
Transcribe any audio to text, translate and edit subtitles 100% locall
A Python library for audio data augmentation
Taming Stable Diffusion for Lip Sync
A nearly-live implementation of OpenAI's Whisper
Toolkit for audio, music, and speech generation
Free, high-quality text-to-speech API endpoint to replace OpenAI
48khz stereo neural audio codec for general audio
SOTA discrete acoustic codec models with 40/75 tokens per second
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Captcha solver extension for humans