Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Robust Speech Recognition via Large-Scale Weak Supervision
Translate the video from one language to another and embed dubbing
End-to-end speech processing toolkit
Automatic Speech Recognition with Word-level Timestamps
Toolkit for conversational AI
Comprehensive Gradio WebUI for audio processing
Underthesea - Vietnamese NLP Toolkit
Persian NLP Toolkit
Generate audiobooks from EPUBs, PDFs and text with captions
Han Language Processing
Faster Whisper transcription with CTranslate2
Open Source Speech Language Model
Fast multimodal LLM for real-time voice interaction and AI apps
Use Microsoft Edge's online text-to-speech service from Python
Audio foundation model excelling in audio understanding
Open-source multi-speaker long-form text-to-speech model
AI-powered tool for generating, optimizing, and translating subtitles
Training data (data labeling, annotation, workflow) for all data types
Voice Recognition to Text Tool
Self-host the powerful Chatterbox TTS model
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
The Classical Language Toolkit
Framework for building realtime multimodal voice AI agents apps
Towards Human-Sounding Speech