Interface for OuteTTS models
A TTS model capable of generating ultra-realistic dialogue
A sound cloning tool with a web interface, using your voice
Underthesea - Vietnamese NLP Toolkit
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Management of Yandex Station and other smart home devices
LLM-based Reinforcement Learning audio edit model
NLP Cloud serves high performance pre-trained or custom models for NER
High-quality multi-lingual text-to-speech library by MyShell.ai
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Framework for building neural networks
Open-source industrial-grade ASR models
Repo of Qwen2-Audio chat & pretrained large audio language model
Bailing is a voice dialogue robot similar to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
SoTA open-source TTS
Instant voice cloning by MIT and MyShell. Audio foundation model
Multi-lingual large voice generation model, providing inference
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Automatically translates the text of a video based on a subtitle file
Generate audiobooks from e-books
Scalable generative AI framework built for researchers and developers
Reading book source
Chat with it via text and voice
Han Language Processing