SAPIEN Manipulation Skill Framework
AI tool that converts GitHub repositories into interactive diagrams
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Contexts Optical Compression
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Reference PyTorch implementation and models for DINOv3
The most powerful Android RPA agent framework
Foundation model for image generation
Benchmarking Multimodal Agents for Open-Ended Tasks
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
All-in-one AI productivity platform with agents, workflows, and IM
A Pioneering Open-Source Alternative to GPT-4o
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
A frontier, first-principles handbook
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Phi-3.5 for Mac: Locally-run Vision and Language Models
GitLab automatic code review tool based on large models
Handwritten Text Recognition (HTR) system implemented with TensorFlow
General-purpose image editing model that delivers high-fidelity