SOTA Open Source TTS
Open-source framework for intelligent speech interaction
Multilingual speech recognition and audio understanding model
Audio foundation model excelling in audio understanding
LLM-based Reinforcement Learning audio edit model
Repo of Qwen2-Audio chat & pretrained large audio language model
Towards Human-Sounding Speech
Tokenizer-Free TTS for Multilingual Speech Generation
Controllable & emotion-expressive zero-shot TTS
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A TTS model capable of generating ultra-realistic dialogue
Instant voice cloning by MIT and MyShell. Audio foundation model
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Interface for OuteTTS models
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
SoTA open-source TTS
Multi-modal large language model designed for audio understanding
Maimaibot, a (more focused) multi-platform intelligent agent
Amica is an open source interface for interactive communication
VITS2 backbone with multilingual-bert
Open source implementation of Microsoft's VALL-E X zero-shot TTS model
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)