Sharp Monocular Metric Depth in Less Than a Second
Recovering the Visual Space from Any Views
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
RGBD video generation model conditioned on camera input
Diffusion Transformer with Fine-Grained Chinese Understanding
Qwen-Image is a powerful image generation foundation model
Fast and Universal 3D reconstruction model for versatile tasks
Tooling for the Common Objects In 3D dataset
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
4M: Massively Multimodal Masked Modeling
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
High-Resolution Image Synthesis with Latent Diffusion Models
Let us control diffusion models
OpenAI’s compact 20B open model for fast, agentic, and local use