All-in-one AI productivity platform with agents, workflows, and IM
Automate native Android apps with AI using accessibility APIs
Gemma open-weight LLM library, from Google DeepMind
A Pioneering Open-Source Alternative to GPT-4o
[NeurIPS 2023] ImageReward: Learning and Evaluating Human Preferences
Official Repo For "Sa2VA: Marrying SAM2 with LLaVA
Label Studio is a multi-type data labeling and annotation tool
Browse the web, directly from Cursor etc.
Doom-based AI research platform for reinforcement learning
Extension of Google Research’s PaperBanana
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Taming Stable Diffusion for Lip Sync
Chinese and English multimodal conversational language model
A frontier, first-principles handbook
Modular quant framework
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
GitLab automatic code review tool based on large models
Phi-3.5 for Mac: Locally-run Vision and Language Models
Agent Skill for generating 2D sprite sheets and map, transparent PNG
A computer vision closed-loop learning platform
Qwen3-omni is a natively end-to-end, omni-modal LLM
Handwritten Text Recognition (HTR) system implemented with TensorFlow
Python package for AutoML on Tabular Data with Feature Engineering
Just a Better Chatbot. Powered by MCP Client & Workflows
General-purpose image editing model that delivers high-fidelity