A text-to-speech, speech-to-text and speech-to-speech library
Chat & pretrained large audio language model proposed by Alibaba Cloud
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
GUI for a Vocal Remover that uses Deep Neural Networks
A Python library for audio
A Family of Open Sourced Music Foundation Models
Audiocraft is a library for audio processing and generation
Speech recognition module for Python
Generate audiobooks from EPUBs, PDFs and text with captions
Synchronized Translation for Videos
Qwen3-omni is a natively end-to-end, omni-modal LLM
A lightweight audio-to-MIDI converter with pitch bend detection
A Python library for audio data augmentation
A nearly-live implementation of OpenAI's Whisper
Taming Stable Diffusion for Lip Sync
Toolkit for audio, music, and speech generation
Free, high-quality text-to-speech API endpoint to replace OpenAI
48khz stereo neural audio codec for general audio
SOTA discrete acoustic codec models with 40/75 tokens per second
Generate audiobooks from e-books, voice cloning & 1107+ languages
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Multimodal Diffusion with Representation Alignment
Implementation of AudioLM audio generation model in Pytorch
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX