Machine learning image inpainting task that removes watermarks
Edit videos with Claude Code
Visual intelligence for your home.
Video Object and Interaction Deletion
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
VMZ: Model Zoo for Video Modeling
"VideoRAG: Chat with Your Videos
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Weaving the Digital Agent Galaxy
Recovering the Visual Space from Any Views
A Pioneering Open-Source Alternative to GPT-4o
Agent-ready RPA suite with visual workflow automation tools engine
Effortless data labeling with AI support from Segment Anything
Generating Immersive, Explorable, and Interactive 3D Worlds
Parse files for optimal RAG
Generate audiobooks from e-books
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Multimodal Diffusion with Representation Alignment
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch