OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.

Features

  • Parse user interface screenshots into structured and easy-to-understand elements
  • Examples available
  • Enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface
  • Ensure you have the V2 weights downloaded in weights folder
  • Model Weights License

Project Samples

Project Activity

See All Activity >

License

Creative Commons Attribution License

Follow OmniParser

OmniParser Web Site

Other Useful Business Software
MongoDB Atlas runs apps anywhere Icon
MongoDB Atlas runs apps anywhere

Deploy in 115+ regions with the modern database for every enterprise.

MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OmniParser!

Additional Project Details

Operating Systems

Windows

Programming Language

Python

Related Categories

Python Agentic AI Tool, Python AI Agent Frameworks, Python AI Agents

Registered

2025-02-18