StarVector is a foundation model for SVG generation
Weaving the Digital Agent Galaxy
Effortless data labeling with AI support from Segment Anything
Edit videos with Claude Code
Recovering the Visual Space from Any Views
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Lets make video diffusion practical
The most powerful Android RPA agent framework
Unified Multimodal Understanding and Generation Models
CogView4, CogView3-Plus and CogView3(ECCV 2024)
Machine learning image inpainting task that removes watermarks
Official implementation of Watermark Anything with Localized Messages
A framework to enable multimodal models to operate a computer
Agent-ready RPA suite with visual workflow automation tools engine
Wan2.1: Open and Advanced Large-Scale Video Generative Model
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Video Object and Interaction Deletion
Master the fundamentals of machine learning, deep learning
Full-stack AI Red Teaming platform
AI-Powered Wiki Generator for GitHub/Gitlab/Bitbucket Repositories
Generating Immersive, Explorable, and Interactive 3D Worlds
Driving with Graph Visual Question Answering
Autoregressive Model Beats Diffusion
LISA: Reasoning Segmentation via Large Language Model
This repository contains the official implementation of FastVLM