Audio foundation model excelling in audio understanding
Large Audio Language Model built for natural interactions
Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Python Audio Analysis Library: Feature Extraction, Classification
Audiocraft is a library for audio processing and generation
Robust Speech Recognition via Large-Scale Weak Supervision
AI tool converting video/audio into structured documents instantly
Data manipulation and transformation for audio signal processing
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
Comprehensive Gradio WebUI for audio processing
Open-source multi-speaker long-form text-to-speech model
AudioMuse-AI is an Open Source Dockerized environment
Automatic Speech Recognition with Word-level Timestamps
An opinionated CLI to transcribe Audio files w/ Whisper on-device
Generate audiobooks from EPUBs, PDFs and text with captions
Official repository for LTX-Video
Fast multimodal LLM for real-time voice interaction and AI apps
AI-powered tool for generating, optimizing, and translating subtitles
Hub of ready-to-use datasets for ML models
Towards Human-Sounding Speech
Edit videos with Claude Code
Use Microsoft Edge's online text-to-speech service from Python
Instill Core is a full-stack AI infrastructure tool for data