AI tool that converts GitHub repositories into interactive diagrams
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Contexts Optical Compression
Reference PyTorch implementation and models for DINOv3
Weaving the Digital Agent Galaxy
The most powerful Android RPA agent framework
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Lets make video diffusion practical
The library to build & auto-optimize LLM applications
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Generating Immersive, Explorable, and Interactive 3D Worlds
Foundation model for image generation
Benchmarking Multimodal Agents for Open-Ended Tasks
Generate audiobooks from e-books
Agent S: an open agentic framework that uses computers like a human
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
VMZ: Model Zoo for Video Modeling