A text-to-speech, speech-to-text and speech-to-speech library
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Open-source framework for intelligent speech interaction
Chat & pretrained large audio language model proposed by Alibaba Cloud
Large Audio Language Model built for natural interactions
LLM-based Reinforcement Learning audio edit model
Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Audio Plugin for Audio to MIDI transcription using deep learning
Official Python inference and LoRA trainer package
Audiocraft is a library for audio processing and generation
A Python library for audio
Python Audio Analysis Library: Feature Extraction, Classification
Speech recognition module for Python
A Python library for audio data augmentation
A Family of Open Sourced Music Foundation Models
Code for openai.fm, a demo for the OpenAI Speech API
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
48khz stereo neural audio codec for general audio
Generate audiobooks from EPUBs, PDFs and text with captions
A lightweight audio-to-MIDI converter with pitch bend detection
Speech-to-text, text-to-speech, and speaker recognition
Multilingual speech recognition and audio understanding model
The open-source voice synthesis studio powered by Qwen3-TTS