Qwen3-ASR is an open-source series of ASR models
EPUB to audiobook converter, optimized for Audiobookshelf
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Real-time voice interactive digital human
Underthesea - Vietnamese NLP Toolkit
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
NLP Cloud serves high performance pre-trained or custom models for NER
Framework for building neural networks
Open-source industrial-grade ASR models
Repo of Qwen2-Audio chat & pretrained large audio language model
Management of Yandex Station and other smart home devices
Generate audiobooks from e-books
SoTA open-source TTS
Provides line-oriented text file editing capabilities
Automatically translates the text of a video based on a subtitle file
Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles
Instant voice cloning by MIT and MyShell. Audio foundation model
LLM-based Reinforcement Learning audio edit model
Bailing is a voice dialogue robot similar to GPT-4o
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Reading book source
Chat with it via text and voice
A Web UI for easy subtitle using whisper model
Official PyTorch Implementation