Fast and Universal 3D reconstruction model for versatile tasks
Diffusion Transformer with Fine-Grained Chinese Understanding
Tooling for the Common Objects In 3D dataset
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Qwen3-omni is a natively end-to-end, omni-modal LLM
Recovering the Visual Space from Any Views
Contexts Optical Compression
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
High-Resolution Image Synthesis with Latent Diffusion Models
A Powerful Native Multimodal Model for Image Generation
Open-source image generative foundation model
Implementation of the Surya Foundation Model for Heliophysics
Multimodal-Driven Architecture for Customized Video Generation
A Systematic Framework for Interactive World Modeling
Generating Immersive, Explorable, and Interactive 3D Worlds
Project Lyra: Open Generative 3D World Models
General-purpose image editing model that delivers high-fidelity
A Customizable Image-to-Video Model based on HunyuanVideo
RGBD video generation model conditioned on camera input
A Unified Framework for Text-to-3D and Image-to-3D Generation
Personalize Any Characters with a Scalable Diffusion Transformer
Generate Any 3D Scene in Seconds
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM