A text-to-speech, speech-to-text and speech-to-speech library
SOTA Open Source TTS
Audio foundation model excelling in audio understanding
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Open-source framework for intelligent speech interaction
Repo of Qwen2-Audio chat & pretrained large audio language model
Speech Note Linux app. Note taking, reading and translating
LLM-based Reinforcement Learning audio edit model
Code for openai.fm, a demo for the OpenAI Speech API
Speech-to-text, text-to-speech, and speaker recognition
Speech recognition module for Python
Multi-modal large language model designed for audio understanding
Generate audiobooks from EPUBs, PDFs and text with captions
Chat & pretrained large audio language model proposed by Alibaba Cloud
Tokenizer-Free TTS for Multilingual Speech Generation
A free, open source, and extensible speech-to-text application
Qwen3-omni is a natively end-to-end, omni-modal LLM
Comprehensive Gradio WebUI for audio processing
Robust Speech Recognition via Large-Scale Weak Supervision
Open Source Speech Language Model
Capable of understanding text, audio, vision, video
PersonaPlex code
Free open source speech synthesizer for Russian and other languages
Fast multimodal LLM for real-time voice interaction and AI apps
Speech-AI-Forge is a project developed around TTS generation model