Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Project Lyra: Open Generative 3D World Models
Models for object and human mesh reconstruction
Visual Causal Flow
code for Mesh R-CNN, ICCV 2019
A Systematic Framework for Interactive World Modeling
Qwen2.5-VL is the multimodal large language model series
Video understanding codebase from FAIR for reproducing video models
Foundational Models for State-of-the-Art Speech and Text Translation
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Let us control diffusion models
Code release for "Masked-attention Mask Transformer
The official pytorch implementation of our paper
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Multimodal Transformer for document image understanding and layout
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video