Models for object and human mesh reconstruction
Video Object and Interaction Deletion
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
code for Mesh R-CNN, ICCV 2019
Tooling for the Common Objects In 3D dataset
Qwen-Image is a powerful image generation foundation model
Recovering the Visual Space from Any Views
Qwen3-VL, the multimodal large language model series by Alibaba Cloud
Qwen2.5-VL is the multimodal large language model series
Generating Immersive, Explorable, and Interactive 3D Worlds
Official implementation of DreamCraft3D
General-purpose image editing model that delivers high-fidelity
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Large Multimodal Models for Video Understanding and Editing
Detect faces in an image
A CNN model that predicts human joints from RGB images of a person
Blazeface is a lightweight model that detects faces in images
Code release for "Masked-attention Mask Transformer
PyTorch implementation of YOLOv4
Learning Continuous Signed Distance Functions for Shape Representation
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
An advanced bilingual image editing with semantic control