A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Open-source framework for intelligent speech interaction
Repo of Qwen2-Audio chat & pretrained large audio language model
Chat & pretrained large audio language model proposed by Alibaba Cloud
LLM-based Reinforcement Learning audio edit model
Large Audio Language Model built for natural interactions
Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Official Python inference and LoRA trainer package
A Python library for audio
Python Audio Analysis Library: Feature Extraction, Classification
Audiocraft is a library for audio processing and generation
A Python library for audio data augmentation
Speech recognition module for Python
A Family of Open Sourced Music Foundation Models
48khz stereo neural audio codec for general audio
Generate audiobooks from EPUBs, PDFs and text with captions
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
A lightweight audio-to-MIDI converter with pitch bend detection
Open-source multi-speaker long-form text-to-speech model
Implementation of AudioLM audio generation model in Pytorch
Fast multimodal LLM for real-time voice interaction and AI apps
Taming Stable Diffusion for Lip Sync
Generate audiobooks from e-books, voice cloning & 1107+ languages