RGBD video generation model conditioned on camera input
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Official Python inference and LoRA trainer package
Generate Any 3D Scene in Seconds
Sharp Monocular Metric Depth in Less Than a Second
Metric monocular depth estimation (vision model)
Vision-language-action model for robot control via images and text