Visual intelligence for your home.
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
PDF to Markdown with vision models
Open-Source Python3 tool for recognizing layouts, tables, and math
Create beautiful slides on the web using Claude's frontend skills
Extension of Google Research’s PaperBanana
Generating Immersive, Explorable, and Interactive 3D Worlds
Agent S: an open agentic framework that uses computers like a human
Windrecorder is a memory search app by records everything
Weaving the Digital Agent Galaxy
Recovering the Visual Space from Any Views
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
StarVector is a foundation model for SVG generation
This repository contains the official implementation of FastVLM
Refer and Ground Anything Anywhere at Any Granularity
Self-supervised visual learning using momentum contrast in PyTorch
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Python inference and LoRA trainer package for the LTX-2 audio–video
State-of-the-art Image & Video CLIP, Multimodal Large Language Models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
The open-source C/C++ package manager
Browse the web, directly from Cursor etc.