A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Official Python inference and LoRA trainer package
A Python library for audio
Python Audio Analysis Library: Feature Extraction, Classification
Audiocraft is a library for audio processing and generation
Speech recognition module for Python
A Python library for audio data augmentation
A Family of Open Sourced Music Foundation Models
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generate audiobooks from EPUBs, PDFs and text with captions
A lightweight audio-to-MIDI converter with pitch bend detection
48khz stereo neural audio codec for general audio
Implementation of AudioLM audio generation model in Pytorch
Open-source multi-speaker long-form text-to-speech model
Multilingual speech recognition and audio understanding model
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Generate audiobooks from e-books, voice cloning & 1107+ languages