Open-AutoGLM is an open-source framework and model designed to empower autonomous mobile intelligent assistants by enabling AI agents to understand and interact with phone screens in a multimodal manner, blending vision and language capability to control real devices. It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Features

  • Multimodal phone screen understanding (vision + language)
  • Autonomous control of smartphone actions (tap, swipe, type)
  • Framework for scripting and deploying mobile AI agents
  • Integration with device automation layers like ADB
  • Example demos for real apps to quickly prototype agents
  • Open framework for research and custom workflows

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow Open-AutoGLM

Open-AutoGLM Web Site

Other Useful Business Software
Go From AI Idea to AI App Fast Icon
Go From AI Idea to AI App Fast

One platform to build, fine-tune, and deploy ML models. No MLOps team required.

Access Gemini 3 and 200+ models. Build chatbots, agents, or custom models with built-in monitoring and scaling.
Try Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of Open-AutoGLM!

Additional Project Details

Operating Systems

Android, Apple iPhone, Linux, Mac, Windows

Programming Language

Python

Related Categories

Python AI Agent Frameworks

Registered

2026-01-20