Multimodal embedding and reranking models built on Qwen3-VL
"Big Model" trains a visual multimodal VLM with 26M parameters
Claude MCP, MCP Servers, MCP Clients
PaddlePaddle End-to-End Development Toolkit
Modular quant framework
Open-source AI agent command center for Claude Code agent teams
Open multimodal web agent built by Ai2
Allow LLMs to control a browser with Browserbase and Stagehand
AI tool for automatic batch short video creation and editing
Declarative engine for generating AI-powered infographic visuals
Interactive Machine Learning experiments
Learning agent trained in a diffusion world model
ChatWiki WeChat official account's AI knowledge base workflow agent
Fast, powerful, git-native ticket tracking in a single bash script
Multimodal model achieving SOTA performance
ICLR2024 Spotlight: curation/training code, metadata, distribution
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Pretty diff to html javascript library (diff2html)
A distributed system for embedding-based vector retrieval
Go package for computer vision using OpenCV 4 and beyond
Vision AI browser agent for automation, testing, and extraction
Unifying 3D Mesh Generation with Language Models
Tutorial on Product Prototype, AI Capability Integration
MapAnything: Universal Feed-Forward Metric 3D Reconstruction
Open source MVVM framework for Web Apps