A text-to-speech, speech-to-text and speech-to-speech library
Open-source framework for intelligent speech interaction
Oobabooga - The definitive Web UI for local AI, with powerful features
Multi-modal large language model designed for audio understanding
Official Python inference and LoRA trainer package
Large Audio Language Model built for natural interactions
A Family of Open Sourced Music Foundation Models
Audiocraft is a library for audio processing and generation
Implementation of AudioLM audio generation model in Pytorch
Toolkit for audio, music, and speech generation
Multimodal Diffusion with Representation Alignment
Taming Stable Diffusion for Lip Sync
Open source AI model for generating full songs from lyrics prompts
48khz stereo neural audio codec for general audio
HunyuanVideo: A Systematic Framework For Large Video Generation Model
A Python library for audio data augmentation
Uses Qwen3-ASR, local LLM, Whisper, TEN-VAD
Generate high-definition story short videos with one click using AI
Generate audiobooks from EPUBs, PDFs and text with captions
AI video generator optimized for low VRAM and older GPUs use
ComfyUI integration for Microsoft's VibeVoice text-to-speech model
One-click deployment (including offline integration package)
TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Generate audiobooks from e-books, voice cloning & 1107+ languages
AI tool converting video/audio into structured documents instantly