A TTS model capable of generating ultra-realistic dialogue
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Underthesea - Vietnamese NLP Toolkit
Bailing is a voice dialogue robot similar to GPT-4o
NLP Cloud serves high performance pre-trained or custom models for NER
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Framework for building neural networks
An opinionated CLI to transcribe Audio files w/ Whisper on-device
LLM-based Reinforcement Learning audio edit model
Interface for OuteTTS models
Audio foundation model excelling in audio understanding
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Repo of Qwen2-Audio chat & pretrained large audio language model
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Chat with it via text and voice
A sound cloning tool with a web interface, using your voice
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Automatically translates the text of a video based on a subtitle file
Instant voice cloning by MIT and MyShell. Audio foundation model
A Web UI for easy subtitle using whisper model
Generate audiobooks from e-books
Official PyTorch Implementation
Han Language Processing
Reading book source
Framework for building AI-powered interactive digital humans and agent