Code for running inference and finetuning with SAM 3 model
Reference PyTorch implementation and models for DINOv3
PyTorch code and models for the DINOv2 self-supervised learning
Video understanding codebase from FAIR for reproducing video models
Tooling for the Common Objects In 3D dataset
Recovering the Visual Space from Any Views
Qwen-Image is a powerful image generation foundation model
Qwen2.5-VL is the multimodal large language model series
Qwen3-ASR is an open-source series of ASR models
Large Multimodal Models for Video Understanding and Editing
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Let us control diffusion models
This repository contains the official implementation of research
Code release for "Masked-attention Mask Transformer
PyTorch implementation of MAE
Per-Pixel Classification is Not All You Need for Semantic Segmentation