DeepMind model for tracking arbitrary points across videos & robotics
Implementation of "MobileCLIP" CVPR 2024
4M: Massively Multimodal Masked Modeling
This repository contains the official implementation of FastVLM
VMZ: Model Zoo for Video Modeling
Official implementation of Watermark Anything with Localized Messages
code for Mesh R-CNN, ICCV 2019
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
ICLR2024 Spotlight: curation/training code, metadata, distribution
Language modeling in a sentence representation space
A Production-ready Reinforcement Learning AI Agent Library
A PyTorch library for implementing flow matching algorithms
Official DeiT repository
Video understanding codebase from FAIR for reproducing video models
GLM-4 series: Open Multilingual Multimodal Chat LMs
A state-of-the-art open visual language model
Chinese and English multimodal conversational language model
GLM-4-Voice | End-to-End Chinese-English Conversational Model
GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Memory-efficient and performant finetuning of Mistral's models
Towards Ultimate Expert Specialization in Mixture-of-Experts Language
Official implementation of DreamCraft3D
Towards Real-World Vision-Language Understanding
Release for Improved Denoising Diffusion Probabilistic Models