A text-to-speech, speech-to-text and speech-to-speech library
Chat & pretrained large audio language model proposed by Alibaba Cloud
Repo of Qwen2-Audio chat & pretrained large audio language model
Audio foundation model excelling in audio understanding
Open-source framework for intelligent speech interaction
LLM-based Reinforcement Learning audio edit model
Oobabooga - The definitive Web UI for local AI, with powerful features
Multi-modal large language model designed for audio understanding
Audiocraft is a library for audio processing and generation
Speech-to-text, text-to-speech, and speaker recognition
LilyPond sheet music text editor
Tokenizer-Free TTS for Multilingual Speech Generation
Qwen3-omni is a natively end-to-end, omni-modal LLM
Code for openai.fm, a demo for the OpenAI Speech API
A Family of Open Sourced Music Foundation Models
Generate audiobooks from EPUBs, PDFs and text with captions
Comprehensive Gradio WebUI for audio processing
A free, open source, and extensible speech-to-text application
Extract audio and video content and organize it into a Markdown note
Speech recognition module for Python
Official Python inference and LoRA trainer package
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Free, high-quality text-to-speech API endpoint to replace OpenAI
Clone a voice in 5 seconds to generate arbitrary speech in real-time