Qwen3-ASR is an open-source series of ASR models
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
kaldi-asr/kaldi is the official location of the Kaldi project
Bailing is a voice dialogue robot similar to GPT-4o
StreamSpeech is a seamless model for offline speech recognition
Audio foundation model excelling in audio understanding
Video translation and dubbing tool powered by LLMs
Real-time voice interactive digital human
Port of OpenAI's Whisper model in C/C++
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Speech-AI-Forge is a project developed around TTS generation model
End-to-end speech processing toolkit
Open-source framework for intelligent speech interaction
Scalable generative AI framework built for researchers and developers
Repo of Qwen2-Audio chat & pretrained large audio language model
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Fast and accurate automatic speech recognition (ASR) for edge devices
Conversational voice AI agents
Open source AI VTuber platform with voice chat and Live2D avatars
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
A library for audio and music analysis, feature extraction
Speech Note Linux app. Note taking, reading and translating
In-App assistant SDK to build a multimodal conversational UX websites
Framework for building AI-powered interactive digital humans and agent