Qwen3-ASR is an open-source series of ASR models
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Robust Speech Recognition Across Languages, Dialects
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
kaldi-asr/kaldi is the official location of the Kaldi project
Bailing is a voice dialogue robot similar to GPT-4o
StreamSpeech is a seamless model for offline speech recognition
Audio foundation model excelling in audio understanding
Video translation and dubbing tool powered by LLMs
Real-time voice interactive digital human
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Port of OpenAI's Whisper model in C/C++
Speech-AI-Forge is a project developed around TTS generation model
Open-source framework for intelligent speech interaction
End-to-end speech processing toolkit
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Conversational voice AI agents
Scalable generative AI framework built for researchers and developers
Repo of Qwen2-Audio chat & pretrained large audio language model
Fast and accurate automatic speech recognition (ASR) for edge devices
Open source AI VTuber platform with voice chat and Live2D avatars
A library for audio and music analysis, feature extraction
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Speech Note Linux app. Note taking, reading and translating
Framework for building AI-powered interactive digital humans and agent