Multi-lingual large voice generation model, providing inference
A TTS model capable of generating ultra-realistic dialogue
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Open-source framework for intelligent speech interaction
Qwen3-ASR is an open-source series of ASR models
Generate audiobooks from e-books
Real-time voice interactive digital human
Underthesea - Vietnamese NLP Toolkit
NLP Cloud serves high performance pre-trained or custom models for NER
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Framework for building neural networks
Open-source industrial-grade ASR models
Repo of Qwen2-Audio chat & pretrained large audio language model
SoTA open-source TTS
Management of Yandex Station and other smart home devices
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
LLM-based Reinforcement Learning audio edit model
Instant voice cloning by MIT and MyShell. Audio foundation model
Bailing is a voice dialogue robot similar to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Automatically translates the text of a video based on a subtitle file
Reading book source
Official PyTorch Implementation
A Web UI for easy subtitle using whisper model
Open source AI VTuber platform with voice chat and Live2D avatars