Showing 3 open source projects for "dom"

View related business solutions
  • $300 Free Credits for Your Google Cloud Projects Icon
    $300 Free Credits for Your Google Cloud Projects

    Start building on Google Cloud with $300 in free credits. No commitment, no credit card required until you're ready to scale.

    Launch your next project with $300 in free Google Cloud credits—no strings attached. Test, build, and deploy without risk. Use your credits across the entire Google Cloud platform to find what works best for your needs. After your credits are used, continue with always-free tier services. Only pay when you're ready to scale. Sign up in minutes and start exploring.
    Start Free Trial
  • Build Agents and Models on One Platform Icon
    Build Agents and Models on One Platform

    Everything you need to build production-ready agents and models. Access 200+ Google and third-party AI models and tools.

    Gemini Enterprise Agent Platform is Google Cloud's comprehensive platform for developers to build, scale, govern, and optimize agents and models. Choose from Google's most advanced models and third-party models like Anthropic's Claude Model Family.
    Try It Free
  • 1
    agent-browser

    agent-browser

    Browser automation CLI for AI agents

    ...It effectively provides a sandbox where AI agents can read, scroll, click, and interpret pages in context, allowing them to automate workflows, answer questions about page content, or generate structured summaries directly from the user’s current tab. The project emphasizes standards and safety, defining interfaces that let agents access DOM data, interpret events, and generate actionable insights without exposing sensitive credential-level access or violating policy boundaries. Users benefit from a tighter feedback loop: agents can observe user tasks in-situ and respond with contextually relevant actions or suggested steps, like form completion, navigation shortcuts, or detailed explanations of UI elements.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Actionbook

    Actionbook

    Browser action engine for AI agents. 10× faster, resilient by design

    Actionbook is an AI-centric automation framework that equips intelligent agents with the ability to interact with real live web pages in a reliable and scalable way, eliminating the guesswork involved in navigating modern dynamic sites. Instead of having agents blindly scrape HTML or blindly try to click things, Actionbook supplies up-to-date action manuals and verified DOM structure, letting agents know exactly how to click, type, and navigate complex interfaces such as SPAs or streaming UIs. This design makes browsing up to 10× faster and far more resilient than ad-hoc approaches that break on minor page changes, because the action manuals codify expected flows and DOM targets. It provides multiple integration paths — a Rust-based CLI, MCP server support for AI IDEs, and a JavaScript SDK — so developers can plug it into a wide range of agent pipelines and toolchains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    OmniParser

    OmniParser

    A simple screen parsing tool towards pure vision based GUI agent

    ...To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo