Models for object and human mesh reconstruction
Large Multimodal Models for Video Understanding and Editing
Code for running inference and finetuning with SAM 3 model
Code for running inference with the SAM 3D Body Model 3DB
Provides convenient access to the Anthropic REST API from any Python 3
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
Foundational Models for State-of-the-Art Speech and Text Translation
Official code for Style Aligned Image Generation via Shared Attention
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Multimodal 7B model for image, video, and text understanding tasks