OmniParserMicrosoft
|
||||||
Related Products
|
||||||
About
Jenova is an all-in-one AI agent built for the Model Context Protocol (MCP) ecosystem that intelligently unifies top models (like GPT-4o, Claude 3.5, and Gemini 1.5) with real-time web search and a suite of embedded tools to vastly simplify workflows, enabling users to send emails, set calendar events, conduct deep research, analyze documents, generate content, and interact with live web data all from a single interface. It dynamically selects the best models and integrates search across sources such as Google, Reddit, YouTube, GitHub, and academic databases, while exposing no-code customization so users can build tailored AI applications (e.g., brand-voice automation, content summarization, or client-specific assistants) without engineering overhead. It emphasizes productivity by consolidating information discovery, contextual understanding, and action generation, surfacing actionable results, summarizing findings, and automating routine tasks, delivered via a mobile-capable agent.
|
About
OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.
|
|||||
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
Platforms Supported
Windows
Mac
Linux
Cloud
On-Premises
iPhone
iPad
Android
Chromebook
|
|||||
Audience
Knowledge workers and teams seeking a tool to consolidate search, research, content creation, and workflow automation into one contextual productivity tool
|
Audience
Researchers in need of a tool to enhance AI agents' interaction with graphical user interfaces through advanced screen parsing techniques
|
|||||
Support
Phone Support
24/7 Live Support
Online
|
Support
Phone Support
24/7 Live Support
Online
|
|||||
API
Offers API
|
API
Offers API
|
|||||
Screenshots and Videos |
Screenshots and Videos |
|||||
Pricing
Free
Free Version
Free Trial
|
Pricing
No information available.
Free Version
Free Trial
|
|||||
Reviews/
|
Reviews/
|
|||||
Training
Documentation
Webinars
Live Online
In Person
|
Training
Documentation
Webinars
Live Online
In Person
|
|||||
Company InformationJenova
United States
www.jenova.ai/
|
Company InformationMicrosoft
Founded: 1975
United States
microsoft.github.io/OmniParser/
|
|||||
Alternatives |
Alternatives |
|||||
|
|
|
|||||
|
|
|
|||||
|
|
|
|||||
Categories |
Categories |
|||||
Integrations
Claude Haiku 3.5
Claude Haiku 4.5
Cua
GPT-4
GPT-4o
Gemini
Gemini Enterprise
GitHub
Google Cloud Platform
Model Context Protocol (MCP)
|
Integrations
Claude Haiku 3.5
Claude Haiku 4.5
Cua
GPT-4
GPT-4o
Gemini
Gemini Enterprise
GitHub
Google Cloud Platform
Model Context Protocol (MCP)
|
|||||
|
|
|