Open-source multi-speaker long-form text-to-speech model
Qwen3-Coder is the code version of Qwen3
High-Resolution Image Synthesis with Latent Diffusion Models
Diffusion Bee is the easiest way to run Stable Diffusion locally
Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference
RGBD video generation model conditioned on camera input
A Systematic Framework for Interactive World Modeling
Qwen3 is the large language model series developed by Qwen team
Qwen3-TTS is an open-source series of TTS models
State-of-the-art TTS model under 25MB
Advancing Open-source World Models
A theoretical reconstruction of the Claude Mythos architecture
Text and image to video generation: CogVideoX and CogVideo
Reference PyTorch implementation and models for DINOv3
From Images to High-Fidelity 3D Assets
An experimental version of DeepSeek model
A Multi-Modal World Model for Reconstructing, Generating, Simulation
Generating Immersive, Explorable, and Interactive 3D Worlds
Lets make video diffusion practical
Qwen3.6 is the large language model series developed by Qwen team
Visual Causal Flow
Open-Source Financial Large Language Models
Code for running inference with the SAM 3D Body Model 3DB
A latent text-to-image diffusion model
A Powerful Native Multimodal Model for Image Generation