Controllable and fast Text-to-Speech for over 7000 languages
A simple native web interface that uses ChatTTS to synthesize text
SOTA discrete acoustic codec models with 40/75 tokens per second
Open-source industrial-grade ASR models
Management of Yandex Station and other smart home devices
A TTS model capable of generating ultra-realistic dialogue
Underthesea - Vietnamese NLP Toolkit
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Bailing is a voice dialogue robot similar to GPT-4o
NLP Cloud serves high performance pre-trained or custom models for NER
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Framework for building neural networks
LLM-based Reinforcement Learning audio edit model
An opinionated CLI to transcribe Audio files w/ Whisper on-device
First class Sublime Text AI assistant with gpt-5, Opus 4.6, Gemini 3
Interface for OuteTTS models
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Chat with it via text and voice
A sound cloning tool with a web interface, using your voice
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Instant voice cloning by MIT and MyShell. Audio foundation model
Automatically translates the text of a video based on a subtitle file