Multi-modal large language model designed for audio understanding
GUI for a Vocal Remover that uses Deep Neural Networks
Audiocraft is a library for audio processing and generation
Data manipulation and transformation for audio signal processing
Robust Speech Recognition via Large-Scale Weak Supervision
Generate audiobooks from EPUBs, PDFs and text with captions
Open-source multi-speaker long-form text-to-speech model
Comprehensive Gradio WebUI for audio processing
Hub of ready-to-use datasets for ML models
Official repository for LTX-Video
Build AI-powered semantic search applications
A sound cloning tool with a web interface, using your voice
Use Microsoft Edge's online text-to-speech service from Python
Towards Human-Sounding Speech
The Triton Inference Server provides an optimized cloud
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Document Image Parsing via Heterogeneous Anchor Prompting”
Large Multimodal Models for Video Understanding and Editing
Private chat with local GPT with document, images, video, etc.
Build Vision Agents quickly with any model or video provider
Controllable and fast Text-to-Speech for over 7000 languages
Build cross-modal and multimodal applications on the cloud
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Code for the paper Hybrid Spectrogram and Waveform Source Separation