Audio foundation model excelling in audio understanding
Large Audio Language Model built for natural interactions
Audio Plugin for Audio to MIDI transcription using deep learning
Official Python inference and LoRA trainer package
Audiocraft is a library for audio processing and generation
Python Audio Analysis Library: Feature Extraction, Classification
A Family of Open Sourced Music Foundation Models
Code for openai.fm, a demo for the OpenAI Speech API
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Multilingual speech recognition and audio understanding model
Implementation of AudioLM audio generation model in Pytorch
Open-source multi-speaker long-form text-to-speech model
A nearly-live implementation of OpenAI's Whisper
Stable diffusion for real-time music generation (web app)
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Fast multimodal LLM for real-time voice interaction and AI apps
Synchronized Translation for Videos
AudioMuse-AI is an Open Source Dockerized environment
One-click deployment (including offline integration package)
Self-hosted AI audio transcription
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
SOTA discrete acoustic codec models with 40/75 tokens per second
Convert files and web content into clean, usable Markdown easily
Multimodal Diffusion with Representation Alignment
AI video generator optimized for low VRAM and older GPUs use