Audio foundation model excelling in audio understanding
Large Audio Language Model built for natural interactions
Audio Plugin for Audio to MIDI transcription using deep learning
Official Python inference and LoRA trainer package
Audiocraft is a library for audio processing and generation
Python Audio Analysis Library: Feature Extraction, Classification
A Family of Open Sourced Music Foundation Models
Code for openai.fm, a demo for the OpenAI Speech API
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Multilingual speech recognition and audio understanding model
Open-source multi-speaker long-form text-to-speech model
Implementation of AudioLM audio generation model in Pytorch
A nearly-live implementation of OpenAI's Whisper
Stable diffusion for real-time music generation (web app)
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Fast multimodal LLM for real-time voice interaction and AI apps
Synchronized Translation for Videos
AudioMuse-AI is an Open Source Dockerized environment
One-click deployment (including offline integration package)
Self-hosted AI audio transcription
Open speech-to-speech models and pipelines by Hugging Face toolkit AI
SOTA discrete acoustic codec models with 40/75 tokens per second
Convert files and web content into clean, usable Markdown easily
AI video generator optimized for low VRAM and older GPUs use
Multimodal Diffusion with Representation Alignment