A simple tool for reading in poorly redacted documents
Machine learning image inpainting task that removes watermarks
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Agent-ready RPA suite with visual workflow automation tools engine
Detects phishing and lookalike domains using DNS fuzzing techniques
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
VMZ: Model Zoo for Video Modeling
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Taming Stable Diffusion for Lip Sync
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Generating Immersive, Explorable, and Interactive 3D Worlds
AI tool that converts GitHub repositories into interactive diagrams
The book 5 of statistics in simplicity
GPT Image 2 prompt gallery, image prompt library, agentic skill
Master the fundamentals of machine learning, deep learning
The Iris Book: Addition, Subtraction, Multiplication, and Division
From Addition, Subtraction, Multiplication, and Division to ML
VGGSfM: Visual Geometry Grounded Deep Structure From Motion