Sharp Monocular Metric Depth in Less Than a Second
Recovering the Visual Space from Any Views
A theoretical reconstruction of the Claude Mythos architecture
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
RGBD video generation model conditioned on camera input
Qwen-Image is a powerful image generation foundation model
Diffusion Transformer with Fine-Grained Chinese Understanding
Tooling for the Common Objects In 3D dataset
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Fast and Universal 3D reconstruction model for versatile tasks
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Stable Virtual Camera: Generative View Synthesis with Diffusion Models
4M: Massively Multimodal Masked Modeling
High-Resolution Image Synthesis with Latent Diffusion Models
Let us control diffusion models
Metric monocular depth estimation (vision model)
OpenAI’s open-weight 120B model optimized for reasoning and tooling
OpenAI’s compact 20B open model for fast, agentic, and local use
Stable fine-tuned Gemma model for structured, clear responses
Efficient MoE model for million-token reasoning and coding
Open, non-commercial SDXL model for quality image generation