Robust Speech Recognition via Large-Scale Weak Supervision
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Multilingual speech recognition and audio understanding model
Audio foundation model excelling in audio understanding
Port of OpenAI's Whisper model in C/C++
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
StreamSpeech is a seamless model for offline speech recognition
Fast multimodal LLM for real-time voice interaction and AI apps
Run local LLMs like llama, deepseek, kokoro etc. inside your browser
Open source AI VTuber platform with voice chat and Live2D avatars
Framework for building real-time voice and multimodal AI agents
End-to-end speech processing toolkit
Real-time voice interactive digital human
LLM Large Model of Selling Anchor
Realtime AI Voice Agents with SoTA Multimodal AI models on Arduino ESP
Large Audio Language Model built for natural interactions
Qwen3-ASR is an open-source series of ASR models
Textream is a free macOS teleprompter app for streamers, interviewers
Production ready toolkit to run AI locally
Framework for building neural networks
AI-powered tool for generating, optimizing, and translating subtitles
Foundational Models for State-of-the-Art Speech and Text Translation
The media player for language learning, with dual subtitles
A Web UI for easy subtitle using whisper model
Bailing is a voice dialogue robot similar to GPT-4o