Code for running inference and finetuning with SAM 3 model
Reference PyTorch implementation and models for DINOv3
PyTorch code and models for the DINOv2 self-supervised learning
Video understanding codebase from FAIR for reproducing video models
Tooling for the Common Objects In 3D dataset
Recovering the Visual Space from Any Views
Uncommon Objects in 3D dataset
Qwen-Image is a powerful image generation foundation model
Qwen2.5-VL is the multimodal large language model series
Qwen3-ASR is an open-source series of ASR models
Large Multimodal Models for Video Understanding and Editing
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Let us control diffusion models
This repository contains the official implementation of research
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Per-Pixel Classification is Not All You Need for Semantic Segmentation