CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather than freeform chat, maintaining a continuous execution history across steps while requiring a fresh session for each new task. Inference supports BF16 on NVIDIA GPUs, with optional INT8 and INT4 modes available but with noted performance loss at INT4; example CLIs and a web demo illustrate bounding-box outputs and operation categories.

Features

  • Bilingual GUI agenting in Chinese and English with screenshots as input
  • Strict, platform-aware prompting for WIN, Mac, and Mobile targets
  • Structured outputs with Action, Operation, Status, Plan, and sensitivity modes
  • Bounding-box grounded operations for precise UI localization
  • CLI and web demos for local inference with saved overlay results
  • SFT and LoRA fine-tuning recipes with detailed GPU and token budgets

Project Samples

Project Activity

See All Activity >

Categories

AI Agents

License

Apache License V2.0

Follow CogAgent

CogAgent Web Site

Other Useful Business Software
Full-stack observability with actually useful AI | Grafana Cloud Icon
Full-stack observability with actually useful AI | Grafana Cloud

Our generous forever free tier includes the full platform, including the AI Assistant, for 3 users with 10k metrics, 50GB logs, and 50GB traces.

Built on open standards like Prometheus and OpenTelemetry, Grafana Cloud includes Kubernetes Monitoring, Application Observability, Incident Response, plus the AI-powered Grafana Assistant. Get started with our generous free tier today.
Create free account
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of CogAgent!

Additional Project Details

Operating Systems

Android, Mac, Windows

Programming Language

Python

Related Categories

Python AI Agents

Registered

2025-10-04