State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
LLM-based Reinforcement Learning audio edit model
Management of Yandex Station and other smart home devices
NLP Cloud serves high performance pre-trained or custom models for NER
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Open-source industrial-grade ASR models
Framework for building neural networks
High-quality multi-lingual text-to-speech library by MyShell.ai
Repo of Qwen2-Audio chat & pretrained large audio language model
Bailing is a voice dialogue robot similar to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
SoTA open-source TTS
Instant voice cloning by MIT and MyShell. Audio foundation model
Multi-lingual large voice generation model, providing inference
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Automatically translates the text of a video based on a subtitle file
Generate audiobooks from e-books
Scalable generative AI framework built for researchers and developers
Reading book source
Chat with it via text and voice
Official PyTorch Implementation
Han Language Processing
Open source AI VTuber platform with voice chat and Live2D avatars
A Web UI for easy subtitle using whisper model
Towards Human-Level Text-to-Speech through Style Diffusion