Project Lyra: Open Generative 3D World Models
State-of-the-art (SoTA) text-to-video pre-trained model
Models for object and human mesh reconstruction
Visual Causal Flow
code for Mesh R-CNN, ICCV 2019
A Systematic Framework for Interactive World Modeling
Qwen2.5-VL is the multimodal large language model series
Video understanding codebase from FAIR for reproducing video models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Let us control diffusion models
Code release for "Masked-attention Mask Transformer
The official pytorch implementation of our paper
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201