Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Robust Speech Recognition via Large-Scale Weak Supervision
Speech recognition module for Python
Multilingual speech recognition and audio understanding model
Audio foundation model excelling in audio understanding
Open-source industrial-grade ASR models
kaldi-asr/kaldi is the official location of the Kaldi project
A PyTorch-based Speech Toolkit
Automatic Speech Recognition with Word-level Timestamps
Multilingual Automatic Speech Recognition with word-level timestamps
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Robust Speech Recognition Across Languages, Dialects
Faster Whisper transcription with CTranslate2
StreamSpeech is a seamless model for offline speech recognition
Toolkit for conversational AI
Underthesea - Vietnamese NLP Toolkit
Repo of Qwen2-Audio chat & pretrained large audio language model
Fast multimodal LLM for real-time voice interaction and AI apps
Translate the video from one language to another and embed dubbing
Voice Recognition to Text Tool
The behavior guidance framework for customer-facing LLM agents
Training data (data labeling, annotation, workflow) for all data types
Open source AI VTuber platform with voice chat and Live2D avatars
Framework for building real-time voice and multimodal AI agents
Omnilingual ASR Open-Source Multilingual SpeechRecognition