Models for object and human mesh reconstruction
Video Object and Interaction Deletion
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Recovering the Visual Space from Any Views
Official implementation of DreamCraft3D
General-purpose image editing model that delivers high-fidelity
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model
Large Multimodal Models for Video Understanding and Editing
Detect faces in an image
A CNN model that predicts human joints from RGB images of a person
Blazeface is a lightweight model that detects faces in images
Code release for "Masked-attention Mask Transformer
PyTorch implementation of YOLOv4
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Qwen2.5-VL-3B-Instruct: Multimodal model for chat, vision & video
An advanced bilingual image editing with semantic control