Audiocraft is a library for audio processing and generation
Robust Speech Recognition via Large-Scale Weak Supervision
Official repository for LTX-Video
Open-source multi-speaker long-form text-to-speech model
A suite of advanced multi-modal LLMs
A react-based starter app for using the Live API over websockets
Use Microsoft Edge's online text-to-speech service from Python
The python library for real-time communication
Towards Human-Sounding Speech
Discover pretrained models for deep learning in MATLAB
Document Image Parsing via Heterogeneous Anchor Prompting”
Large Multimodal Models for Video Understanding and Editing
Build Vision Agents quickly with any model or video provider
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Chinese text-to-speech engine
Task of transcribing piano recordings into MIDI files
Text-to-Speech for Basque and Spanish
Separate audio recordings into individual sources
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English
A fast GPU accelerated feature extraction software for speech analysis
An Incremental Spoken Dialogue Processing Toolkit
simple algorithm for a realtime interactive visual cortex for painting