GLM-4-Voice | End-to-End Chinese-English Conversational Model
Industrial-level controllable zero-shot text-to-speech system
Controllable & emotion-expressive zero-shot TTS
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
Open-source framework for intelligent speech interaction
Long-form streaming TTS system for multi-speaker dialogue generation
Open Source Speech Language Model
From Images to High-Fidelity 3D Assets
The most powerful local music generation model
Multi-modal large language model designed for audio understanding
Wan2.1: Open and Advanced Large-Scale Video Generative Model
High-Resolution 3D Assets Generation with Large Scale Diffusion Models
Repo of Qwen2-Audio chat & pretrained large audio language model
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
A Systematic Framework for Interactive World Modeling
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Chat & pretrained large audio language model proposed by Alibaba Cloud
High-Resolution Image Synthesis with Latent Diffusion Models
High-Fidelity and Controllable Generation of Textured 3D Assets
Qwen3-ASR is an open-source series of ASR models
Capable of understanding text, audio, vision, video
HY-Motion model for 3D character animation generation
RGBD video generation model conditioned on camera input