VGGT-Omega is a Facebook Research computer vision project for feed-forward camera and depth reconstruction. It takes images as input and predicts camera parameters, depth maps, confidence values, and related scene tokens. The project is associated with 3D understanding workflows where models infer scene geometry without a traditional multi-stage reconstruction pipeline. It includes pretrained model variants with different resolutions and text-alignment capabilities, though checkpoint access may require approval. The repository also provides a Gradio demo that can visualize predicted cameras and depth-unprojected point clouds as a GLB scene. VGGT-Omega is best suited for researchers and developers working on 3D reconstruction, visual geometry, and image-based scene understanding.
Features
- Feed-forward camera and depth reconstruction
- Image-based scene geometry prediction
- Camera intrinsics and extrinsics estimation
- Depth and confidence output generation
- Gradio demo for visualizing reconstructed scenes
- Pretrained model variants with checkpoint access workflow