Audiocraft is a library for audio processing and generation
Robust Speech Recognition via Large-Scale Weak Supervision
Open-source multi-speaker long-form text-to-speech model
Official repository for LTX-Video
A suite of advanced multi-modal LLMs
A react-based starter app for using the Live API over websockets
Use Microsoft Edge's online text-to-speech service from Python
The python library for real-time communication
Towards Human-Sounding Speech
Discover pretrained models for deep learning in MATLAB
Document Image Parsing via Heterogeneous Anchor Prompting”
Large Multimodal Models for Video Understanding and Editing
Build Vision Agents quickly with any model or video provider
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Chinese text-to-speech engine
Common Resource Grep
Task of transcribing piano recordings into MIDI files
Text-to-Speech for Basque and Spanish
Separate audio recordings into individual sources
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
Text-to-Speech TTS for Basque, Spanish, Catalan, Galician and English
A fast GPU accelerated feature extraction software for speech analysis
Speech recognition application builder and library