ASCII art library for Python
An extensive node suite that enables ComfyUI to process 3D inputs
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Recovering the Visual Space from Any Views
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
Unified Multimodal Understanding and Generation Models
GPT Image 2 prompt gallery, image prompt library, agentic skill
Python inference and LoRA trainer package for the LTX-2 audio–video
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Machine learning image inpainting task that removes watermarks
From Addition, Subtraction, Multiplication, and Division to ML
A neural network that transforms a design mock-up into static websites
SAPIEN Manipulation Skill Framework
AI tool that converts GitHub repositories into interactive diagrams
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Contexts Optical Compression
PS2 Covers Collection
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
The most powerful Android RPA agent framework
Multimodal Diffusion with Representation Alignment
Reference PyTorch implementation and models for DINOv3