Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
Speech recognition module for Python
Multilingual speech recognition and audio understanding model
Robust Speech Recognition via Large-Scale Weak Supervision
Multi-modal large language model designed for audio understanding
Speech-to-text, text-to-speech, and speaker recognition
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Captcha solver extension for humans
Fast and accurate automatic speech recognition (ASR) for edge devices
Automatic Speech Recognition with Word-level Timestamps
Speech recognition for your site
HTML5 js recording mp3 wav ogg webm amr format
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
A free, open source, and extensible speech-to-text application
Fast multimodal LLM for real-time voice interaction and AI apps
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Python Audio Analysis Library: Feature Extraction, Classification
Capable of understanding text, audio, vision, video
Voice Recognition to Text Tool
Framework for building real-time voice and multimodal AI agents
Data manipulation and transformation for audio signal processing
Self-hosted AI audio transcription
A gallery that showcases on-device ML/GenAI use cases