Audio foundation model excelling in audio understanding
A Family of Open Sourced Music Foundation Models
Audiocraft is a library for audio processing and generation
Code for openai.fm, a demo for the OpenAI Speech API
Synchronized Translation for Videos
A nearly-live implementation of OpenAI's Whisper
Toolkit for audio, music, and speech generation
SOTA discrete acoustic codec models with 40/75 tokens per second
Tencent Hunyuan Multimodal diffusion transformer (MM-DiT) model
Make videos programmatically with React
Multimodal Diffusion with Representation Alignment
Implementation of AudioLM audio generation model in Pytorch
Open-source multi-speaker long-form text-to-speech model
A single Gradio + React WebUI with extensions for ACE-Step
Open source text-to-speech tool, supports extra-long text
Robust Speech Recognition via Large-Scale Weak Supervision
A Systematic Framework for Interactive World Modeling
Generate blog articles from video or audio
The python library for real-time communication
Multimodal-Driven Architecture for Customized Video Generation
Use Microsoft Edge's online text-to-speech service from Python
Interface for OuteTTS models
MARS5 speech model (TTS) from CAMB.AI
A react-based starter app for using the Live API over websockets
Towards Human-Sounding Speech