Video Object and Interaction Deletion
code for Mesh R-CNN, ICCV 2019
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Recovering the Visual Space from Any Views
Qwen2.5-VL is the multimodal large language model series
Qwen-Image is a powerful image generation foundation model
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Generating Immersive, Explorable, and Interactive 3D Worlds
Official implementation of DreamCraft3D
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
General-purpose image editing model that delivers high-fidelity
A SOTA open-source image editing model
Large Multimodal Models for Video Understanding and Editing
Detect faces in an image
Blazeface is a lightweight model that detects faces in images
A CNN model that predicts human joints from RGB images of a person
Code release for "Masked-attention Mask Transformer
Learning Continuous Signed Distance Functions for Shape Representation
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
An advanced bilingual image editing with semantic control