Audio foundation model excelling in audio understanding
Repo of Qwen2-Audio chat & pretrained large audio language model
Large Audio Language Model built for natural interactions
Speech recognition module for Python
Multilingual speech recognition and audio understanding model
Robust Speech Recognition via Large-Scale Weak Supervision
Multi-modal large language model designed for audio understanding
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Automatic Speech Recognition with Word-level Timestamps
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX
Fast multimodal LLM for real-time voice interaction and AI apps
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Python Audio Analysis Library: Feature Extraction, Classification
Capable of understanding text, audio, vision, video
Voice Recognition to Text Tool
Framework for building real-time voice and multimodal AI agents
Data manipulation and transformation for audio signal processing
StreamSpeech is a seamless model for offline speech recognition
Omnilingual ASR Open-Source Multilingual SpeechRecognition
Qwen3-omni is a natively end-to-end, omni-modal LLM
A Web UI for easy subtitle using whisper model
Translate the video from one language to another and embed dubbing
Qwen3-ASR is an open-source series of ASR models
AI-powered tool for generating, optimizing, and translating subtitles
VMZ: Model Zoo for Video Modeling