Long-form streaming TTS system for multi-speaker dialogue generation
Open Source Speech Language Model
Industrial-level controllable zero-shot text-to-speech system
Open-source framework for intelligent speech interaction
Controllable & emotion-expressive zero-shot TTS
Capable of understanding text, audio, vision, video
Qwen3-TTS is an open-source series of TTS models
GLM-4-Voice | End-to-End Chinese-English Conversational Model
State-of-the-art TTS model under 25MB
Repo of Qwen2-Audio chat & pretrained large audio language model
Multi-modal large language model designed for audio understanding
From Images to High-Fidelity 3D Assets
Wan2.1: Open and Advanced Large-Scale Video Generative Model
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
LLM-based Reinforcement Learning audio edit model
A Conversational Speech Generation Model
High-Resolution Image Synthesis with Latent Diffusion Models
High-Fidelity and Controllable Generation of Textured 3D Assets
Reproduction of Poetiq's record-breaking submission to the ARC-AGI-1
PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)
Official PyTorch Implementation of "Scalable Diffusion Models"
GLIDE: a diffusion-based text-conditional image synthesis model
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201