A text-to-speech, speech-to-text and speech-to-speech library
Open-source framework for intelligent speech interaction
Oobabooga - The definitive Web UI for local AI, with powerful features
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
Large Audio Language Model built for natural interactions
A Family of Open Sourced Music Foundation Models
Transforming Multimodal Content into Captivating Multilingual Audio
Streaming Real-time Audio-Driven Avatar Generation
Toolkit for audio, music, and speech generation
Audiocraft is a library for audio processing and generation
Implementation of AudioLM audio generation model in Pytorch
Taming Stable Diffusion for Lip Sync
Multimodal Diffusion with Representation Alignment
A Python library for audio data augmentation
48khz stereo neural audio codec for general audio
AudioMuse-AI is an Open Source Dockerized environment
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Generate audiobooks from EPUBs, PDFs and text with captions
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generate high-definition story short videos with one click using AI
A speech-text foundation model for real time dialogue
Automated Music Discovery and Collection Manager
Open source AI model for generating full songs from lyrics prompts
ComfyUI integration for Microsoft's VibeVoice text-to-speech model