Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
Multilingual speech recognition and audio understanding model
Speech recognition module for Python
Robust Speech Recognition via Large-Scale Weak Supervision
Multi-modal large language model designed for audio understanding
Speech-to-text, text-to-speech, and speaker recognition
Fast and accurate automatic speech recognition (ASR) for edge devices
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Captcha solver extension for humans
Speech recognition for your site
HTML5 js recording mp3 wav ogg webm amr format
Automatic Speech Recognition with Word-level Timestamps
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
A free, open source, and extensible speech-to-text application
Fast multimodal LLM for real-time voice interaction and AI apps
Framework for building real-time voice and multimodal AI agents
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Python Audio Analysis Library: Feature Extraction, Classification
Voice Recognition to Text Tool
A gallery that showcases on-device ML/GenAI use cases
Capable of understanding text, audio, vision, video
A library for audio and music analysis, feature extraction
Omnilingual ASR Open-Source Multilingual SpeechRecognition