OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.

Features

  • Parse user interface screenshots into structured and easy-to-understand elements
  • Examples available
  • Enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface
  • Ensure you have the V2 weights downloaded in weights folder
  • Model Weights License

Project Samples

Project Activity

See All Activity >

License

Creative Commons Attribution License

Follow OmniParser

OmniParser Web Site

Other Useful Business Software
Our Free Plans just got better! | Auth0 Icon
Our Free Plans just got better! | Auth0

With up to 25k MAUs and unlimited Okta connections, our Free Plan lets you focus on what you do best—building great apps.

You asked, we delivered! Auth0 is excited to expand our Free and Paid plans to include more options so you can focus on building, deploying, and scaling applications without having to worry about your security. Auth0 now, thank yourself later.
Try free now
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of OmniParser!

Additional Project Details

Operating Systems

Windows

Programming Language

Python

Related Categories

Python Agentic AI Tool, Python AI Agent Frameworks, Python AI Agents

Registered

2025-02-18