Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Audiocraft is a library for audio processing and generation
Data manipulation and transformation for audio signal processing
Robust Speech Recognition via Large-Scale Weak Supervision
Generate audiobooks from EPUBs, PDFs and text with captions
Comprehensive Gradio WebUI for audio processing
Hub of ready-to-use datasets for ML models
A library for audio and music analysis, feature extraction
A GPU-accelerated library containing highly optimized building blocks
A private, local meeting notes assistant
Official repository for LTX-Video
Open-source multi-speaker long-form text-to-speech model
A free, open source, and extensible speech-to-text application
Build AI-powered semantic search applications
Video translation and dubbing tool powered by LLMs
A suite of advanced multi-modal LLMs
AI app store powered by 24/7 desktop history. open source
A react-based starter app for using the Live API over websockets
A sound cloning tool with a web interface, using your voice
Use Microsoft Edge's online text-to-speech service from Python
AI Multi-Agent Framework in .NET
The python library for real-time communication
Towards Human-Sounding Speech
Discover pretrained models for deep learning in MATLAB