Medical imaging toolkit for deep learning
Refer and Ground Anything Anywhere at Any Granularity
Project Lyra: Open Generative 3D World Models
State-of-the-art (SoTA) text-to-video pre-trained model
Unsupervised Learning for Image Registration
A high performance implementation of HDBSCAN clustering
Models for object and human mesh reconstruction
Visual Causal Flow
code for Mesh R-CNN, ICCV 2019
A Systematic Framework for Interactive World Modeling
Qwen2.5-VL is the multimodal large language model series
Video understanding codebase from FAIR for reproducing video models
Unifying 3D Mesh Generation with Language Models
Gracefully face hCaptcha challenge with multimodal llms
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Let us control diffusion models
Implementation of BEVFormer, a camera-only framework
Code release for "Masked-attention Mask Transformer
The official pytorch implementation of our paper
Pytorch framework for doing deep learning on point clouds
A Neural Net Training Interface on TensorFlow, with focus on speed
A PyTorch implementation of the NIPS 2017 paper
Code for "Image Generation from Scene Graphs", Johnson et al, CVPR 201
Style transfer, deep learning, feature transform