AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Multimodal Diffusion with Representation Alignment
A neural network that transforms a design mock-up into static websites
SAPIEN Manipulation Skill Framework
All-in-one AI productivity platform with agents, workflows, and IM
Generating Immersive, Explorable, and Interactive 3D Worlds
Reference PyTorch implementation and models for DINOv3
Qwen3-omni is a natively end-to-end, omni-modal LLM
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Lets make video diffusion practical
Machine learning image inpainting task that removes watermarks
Agent S: an open agentic framework that uses computers like a human
Towards Real-World Vision-Language Understanding
Master the fundamentals of machine learning, deep learning
Open-source evaluation toolkit of large multi-modality models (LMMs)
The most powerful Android RPA agent framework
Label Studio is a multi-type data labeling and annotation tool
An open phone agent model & framework
Extension of Google Research’s PaperBanana
Browse the web, directly from Cursor etc.
InvokeAI is a leading creative engine for Stable Diffusion models
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
State-of-the-art Image & Video CLIP, Multimodal Large Language Models