Models for object and human mesh reconstruction
Tooling for the Common Objects In 3D dataset
Large Multimodal Models for Video Understanding and Editing
Code for running inference and finetuning with SAM 3 model
Uncommon Objects in 3D dataset
Code for running inference with the SAM 3D Body Model 3DB
Official implementation of Watermark Anything with Localized Messages
Chat & pretrained large vision language model
code for Mesh R-CNN, ICCV 2019
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Qwen2.5-VL is the multimodal large language model series
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Foundational Models for State-of-the-Art Speech and Text Translation
Provides convenient access to the Anthropic REST API from any Python 3
A SOTA open-source image editing model
Official code for Style Aligned Image Generation via Shared Attention
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Multimodal 7B model for image, video, and text understanding tasks