Synchronized Translation for Videos
A fast TTS architecture with conditional flow matching
High-Fidelity and Controllable Generation of Textured 3D Assets
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
A high-quality rapid TTS voice cloning model
Instant voice cloning by MIT and MyShell. Audio foundation model
Open-source framework for intelligent speech interaction
Capable of understanding text, audio, vision, video
Package manager and build abstraction tool for FPGA/ASIC development
HY-Motion model for 3D character animation generation
A sound cloning tool with a web interface, using your voice
100–200× Acceleration for Video Diffusion Models
Transforming Multimodal Content into Captivating Multilingual Audio
Qwen-Image is a powerful image generation foundation model
CodeGeeX: An Open Multilingual Code Generation Model (KDD 2023)
Unified Multimodal Understanding and Generation Models
RGBD video generation model conditioned on camera input
Build voice-based LLM agents. Modular + open source
Spark-TTS Inference Code
High-Resolution Image Synthesis with Latent Diffusion Models
Open source personal AI Assistant for Linux, Windows and Mac
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Real-time behaviour synthesis with MuJoCo, using Predictive Control
Clone a voice in 5 seconds to generate arbitrary speech in real-time