ML Sharp
Sharp Monocular View Synthesis in Less Than a Second
ML Sharp is a research code release that turns a single 2D photograph into a photorealistic 3D representation that can be rendered from nearby viewpoints. Instead of requiring multi-view input, it predicts the parameters of a 3D Gaussian scene representation directly from one image using a single forward pass through a neural network. The core idea is speed: the 3D representation is produced in under a second on a standard GPU, and then the resulting scene can be rendered in real time to generate new views interactively. The representation is metric, meaning it supports camera movements with an absolute scale rather than only relative depth cues, which is useful for consistent viewpoint changes and downstream spatial tasks. ...