A Pioneering Open-Source Alternative to GPT-4o
A framework to enable multimodal models to operate a computer
Agent-ready RPA suite with visual workflow automation tools engine
LTX-Video Support for ComfyUI
Automated translation solution for visual novels
Official SeedVR2 Video Upscaler for ComfyUI
Official Python inference and LoRA trainer package
Turn WiFi signals into real-time human pose estimation and detection
Tiny vision language model
Unified Multimodal Understanding and Generation Models
Qwen-Image-Layered: Layered Decomposition for Inherent Editablity
SAPIEN Manipulation Skill Framework
The most powerful Android RPA agent framework
A Grub Theme in the style of Minecraft!
Recovering the Visual Space from Any Views
AI tool that converts GitHub repositories into interactive diagrams
Machine Learning, Criticism and Correction
Edit videos with Claude Code
Mixture-of-Experts Vision-Language Models for Advanced Multimodal
Autoregressive Model Beats Diffusion
StarVector is a foundation model for SVG generation
Book_4_Matrix Power | The Iris Book: From Addition, Subtraction
Machine learning image inpainting task that removes watermarks
"VideoRAG: Chat with Your Videos
Official implementation of Watermark Anything with Localized Messages