CogAgent

CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather than freeform chat, maintaining a continuous execution history across steps while requiring a fresh session for each new task. Inference supports BF16 on NVIDIA GPUs, with optional INT8 and INT4 modes available but with noted performance loss at INT4; example CLIs and a web demo illustrate bounding-box outputs and operation categories.

Features

Bilingual GUI agenting in Chinese and English with screenshots as input
Strict, platform-aware prompting for WIN, Mac, and Mobile targets
Structured outputs with Action, Operation, Status, Plan, and sensitivity modes
Bounding-box grounded operations for precise UI localization
CLI and web demos for local inference with saved overlay results
SFT and LoRA fine-tuning recipes with detailed GPU and token budgets

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow CogAgent

CogAgent Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

The database for AI-powered applications.

MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.

Start Free

Rate This Project

User Reviews

Be the first to post a review of CogAgent!

Additional Project Details

Operating Systems

Android, Mac, Windows

Programming Language

Python

Related Categories

Python AI Agents

Registered

5 days ago

Similar Business Software

UI-TARS

UI-TARS is an advanced vision-language model designed for seamless interaction with graphical user interfaces (GUIs) by integrating perception, reasoning, grounding, and memory into a unified system. It processes multimodal inputs, such as text and images, to understand interfaces and execute...

See Software
Jotform

Trusted by over 25 million users, Jotform is an all-in-one, no-code platform that simplifies data collection, automation, and online sales. Using its drag-and-drop Form Builder, businesses can create customized forms and surveys to collect leads, payments, and e-signatures. With 10,000+...

See Software
Assembled

Assembled is the only platform that unifies AI agents and intelligent workforce management to power fast and flexible support operations. Built for scale, we help teams automate over 50% of customer interactions, forecast with 90%+ accuracy, and optimize staffing across in-house and BPO teams....

See Software
Open Computer Agent

The Open Computer Agent is a browser-based AI assistant developed by Hugging Face that automates web interactions such as browsing, form-filling, and data retrieval. It leverages vision-language models like Qwen-VL to simulate mouse and keyboard actions, enabling tasks like booking tickets,...

See Software
Vertex AI

Build, deploy, and scale machine learning (ML) models faster, with fully managed ML tools for any use case. Through Vertex AI Workbench, Vertex AI is natively integrated with BigQuery, Dataproc, and Spark. You can use BigQuery ML to create and execute machine learning models in BigQuery...

See Software
StackAI

StackAI is an enterprise AI automation platform to build end-to-end internal tools and processes with AI agents in a fully compliant and secure way. Designed for large organizations, it enables teams to automate complex workflows across operations, compliance, finance, IT, and support without...

See Software

Report inappropriate content

CogAgent

An open sourced end-to-end VLM-based GUI Agent

Get an email when there's a new version of CogAgent

Features

Project Samples

Project Activity

Categories

License

Follow CogAgent

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered