Project Lyra: Open Generative 3D World Models
Models for object and human mesh reconstruction
Visual Causal Flow
A Systematic Framework for Interactive World Modeling
Video understanding codebase from FAIR for reproducing video models
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Let us control diffusion models
Code release for "Masked-attention Mask Transformer
The official pytorch implementation of our paper
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Multimodal Transformer for document image understanding and layout
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video