Qwen3-ASR is an open-source series of ASR models
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
kaldi-asr/kaldi is the official location of the Kaldi project
Bailing is a voice dialogue robot similar to GPT-4o
StreamSpeech is a seamless model for offline speech recognition
Audio foundation model excelling in audio understanding
Real-time voice interactive digital human
Video translation and dubbing tool powered by LLMs
Easy-to-use Speech Toolkit including Self-Supervised Learning model
Port of OpenAI's Whisper model in C/C++
Open-source framework for intelligent speech interaction
End-to-end speech processing toolkit
Speech-AI-Forge is a project developed around TTS generation model
Scalable generative AI framework built for researchers and developers
Repo of Qwen2-Audio chat & pretrained large audio language model
Conversational voice AI agents
Nexa SDK is a comprehensive toolkit for supporting ONNX and GGML
Fast and accurate automatic speech recognition (ASR) for edge devices
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Speech Note Linux app. Note taking, reading and translating
A library for audio and music analysis, feature extraction
Open source AI VTuber platform with voice chat and Live2D avatars
In-App assistant SDK to build a multimodal conversational UX websites
Framework for building AI-powered interactive digital humans and agent