Fast and accurate automatic speech recognition (ASR) for edge devices
Open-source framework for intelligent speech interaction
Comprehensive Gradio WebUI for audio processing
Repo of Qwen2-Audio chat & pretrained large audio language model
A sound cloning tool with a web interface, using your voice
Large Audio Language Model built for natural interactions
A text-to-speech, speech-to-text and speech-to-speech library
Chat & pretrained large audio language model proposed by Alibaba Cloud
The open-source voice synthesis studio powered by Qwen3-TTS
Clone a voice in 5 seconds to generate arbitrary speech in real-time
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Multi-modal large language model designed for audio understanding
Instant voice cloning by MIT and MyShell. Audio foundation model
Tokenizer-Free TTS for Multilingual Speech Generation
A native macOS menu bar app for managing audio device priorities
Framework for building real-time voice and multimodal AI agents
Generate audiobooks from e-books, voice cloning & 1107+ languages
LLM-based Reinforcement Learning audio edit model
Free, high-quality text-to-speech API endpoint to replace OpenAI
High-Quality Voice Cloning TTS for 600+ Languages
The missing YouTube Music macOS app
A set of AI-enabled effects, generators, and analyzers for Audacity
Code for openai.fm, a demo for the OpenAI Speech API
Fast multimodal LLM for real-time voice interaction and AI apps
PersonaPlex code