Official repository for LTX-Video
GLM-4-Voice | End-to-End Chinese-English Conversational Model
Foundational Models for State-of-the-Art Speech and Text Translation
Large Multimodal Models for Video Understanding and Editing
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
A Conversational Speech Generation Model
Di♪♪Rhythm: Blazingly Fast & Simple End-to-End Song Generation
Code for the paper Hybrid Spectrogram and Waveform Source Separation
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)
React app for inspecting, building and debugging with the Realtime API
Dia-1.6B generates lifelike English dialogue and vocal expressions
CTC-based forced aligner for audio-text in 158 languages
Portuguese ASR model fine-tuned on XLSR-53 for 16kHz audio input
Russian ASR model fine-tuned on Common Voice and CSS10 datasets