AI tool that converts GitHub repositories into interactive diagrams
An extensive node suite that enables ComfyUI to process 3D inputs
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Multimodal Diffusion with Representation Alignment
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
All-in-one AI productivity platform with agents, workflows, and IM
Video Object and Interaction Deletion
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
The most powerful Android RPA agent framework
Foundation model for image generation
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Phi-3.5 for Mac: Locally-run Vision and Language Models
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A Pioneering Open-Source Alternative to GPT-4o
Benchmarking Multimodal Agents for Open-Ended Tasks
General-purpose image editing model that delivers high-fidelity
Handwritten Text Recognition (HTR) system implemented with TensorFlow
A frontier, first-principles handbook
Motion-controllable Video Generation via Latent Trajectory Guidance