Tiny vision language model
Tooling for the Common Objects In 3D dataset
code for Mesh R-CNN, ICCV 2019
Uncommon Objects in 3D dataset
GPT4V-level open-source multi-modal model based on Llama3-8B
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
Official DeiT repository
Implementation of the Surya Foundation Model for Heliophysics
Large Multimodal Models for Video Understanding and Editing
Capable of understanding text, audio, vision, video
This repository contains the official implementation of research
A method to increase the speed and lower the memory footprint
PyTorch implementation of MAE
The official pytorch implementation of our paper