A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Open-source framework for intelligent speech interaction
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
Audiocraft is a library for audio processing and generation
GUI for a Vocal Remover that uses Deep Neural Networks
Official Python inference and LoRA trainer package
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
A Python library for audio
Create UIs for your machine learning model in Python in 3 minutes
A Family of Open Sourced Music Foundation Models
Speech recognition module for Python
Python Audio Analysis Library: Feature Extraction, Classification
The official Python client for the Huggingface Hub
Multilingual speech recognition and audio understanding model
A lightweight audio-to-MIDI converter with pitch bend detection
Taming Stable Diffusion for Lip Sync
Synchronized Translation for Videos
Capable of understanding text, audio, vision, video
Qwen3-omni is a natively end-to-end, omni-modal LLM
Open-source infrastructure for Computer-Use Agents. Sandboxes