HY-World 2.0
A Multi-Modal World Model for Reconstructing, Generating, Simulation
...It accepts text prompts, single-view images, multi-view images, and videos, and produces 3D world representations rather than limiting output to flat video generation. For text and single-image inputs, it generates high-fidelity 3D Gaussian Splatting scenes through a multi-stage pipeline that includes panorama generation, trajectory planning, world expansion, and world composition. The system also improves reconstruction from multi-view images and video by upgrading its feed-forward 3D prediction components and its memory-aware view generation process. Another major part of the project is WorldLens, a rendering platform designed for interactive exploration with an engine-agnostic architecture, automatic image-based lighting, collision detection, and support for character interaction.