• $300 Free Credits to Build on Google Cloud Icon
    $300 Free Credits to Build on Google Cloud

    New to Google Cloud? Get $300 in credits to explore Compute Engine, BigQuery, Cloud Run, Gemini Enterprise Agent Platform, and more.

    Start your next project with $300 in free Google Cloud credit. Spin up VMs, run containers, query petabytes in BigQuery, or build agents with Gemini Enterprise Agent Platform. Once your credits are used, keep building with 20+ always-free tier products including Compute Engine, Cloud Storage, GKE, and Cloud Run functions. No commitment required—just sign up and start building.
    Claim $300 Free
  • Ship Agents Faster Icon
    Ship Agents Faster

    Transform your applications and workflows into powerful agentic systems at global scale.

    Gemini Enterprise Agent Platform lets you rapidly build, scale, govern and optimize production-ready agents grounded in your organization's data. The platform enables developers to build custom or pre-built agents for virtually any use case. New customers get $300 in free credits.
    Get Started Free
  • 1
    agent-browser

    agent-browser

    Browser automation CLI for AI agents

    ...It effectively provides a sandbox where AI agents can read, scroll, click, and interpret pages in context, allowing them to automate workflows, answer questions about page content, or generate structured summaries directly from the user’s current tab. The project emphasizes standards and safety, defining interfaces that let agents access DOM data, interpret events, and generate actionable insights without exposing sensitive credential-level access or violating policy boundaries. Users benefit from a tighter feedback loop: agents can observe user tasks in-situ and respond with contextually relevant actions or suggested steps, like form completion, navigation shortcuts, or detailed explanations of UI elements.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    Actionbook

    Actionbook

    Browser action engine for AI agents. 10× faster, resilient by design

    Actionbook is an AI-centric automation framework that equips intelligent agents with the ability to interact with real live web pages in a reliable and scalable way, eliminating the guesswork involved in navigating modern dynamic sites. Instead of having agents blindly scrape HTML or blindly try to click things, Actionbook supplies up-to-date action manuals and verified DOM structure, letting agents know exactly how to click, type, and navigate complex interfaces such as SPAs or streaming UIs. This design makes browsing up to 10× faster and far more resilient than ad-hoc approaches that break on minor page changes, because the action manuals codify expected flows and DOM targets. It provides multiple integration paths — a Rust-based CLI, MCP server support for AI IDEs, and a JavaScript SDK — so developers can plug it into a wide range of agent pipelines and toolchains.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Onlook

    Onlook

    The Cursor for Designers • An Open-Source AI-First Design tool

    Seamlessly integrate with any website or web app running on React + TailwindCSS, and make live edits directly in the browser DOM. Customize your design, control your codebase, and push changes your changes without compromise. Link Onlook to your React project with just one command. Run this command on your project's root folder to get set up in seconds. Onlook writes reliable code you can trust, exactly where it needs to go. Adjust layouts, change colors, modify text, and more.
    Downloads: 22 This Week
    Last Update:
    See Project
  • 4
    Midscene

    Midscene

    Vision-based AI framework for cross-platform UI automation tasks

    Midscene.js is an open source AI-driven UI automation framework designed to control user interfaces across multiple platforms using natural language instructions. Instead of relying on traditional selectors, DOM structures, or accessibility attributes, it uses a vision-first approach where screenshots are analyzed by visual-language models to identify interface elements and perform actions. It allows developers to automate interactions on web applications, desktop software, and mobile devices without needing platform-specific automation logic. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure Icon
    Stop Cyber Threats with VM-Series Next-Gen Firewall on Azure

    Native application identity and user-based security for your Azure cloud

    Gain integrated visibility across all traffic in a single pass. Deploy Palo Alto Networks VM-Series to determine application identity and content while automating security policy updates via rich APIs.
    Get a free trial
  • 5
    BrowserTools MCP

    BrowserTools MCP

    Monitor browser logs directly from Cursor

    Browser Tools MCP is an MCP server and Chrome extension that gives AI agents safe, structured access to your live browser for debugging and automation. It can capture console/network logs, DOM snapshots, and screenshots, and expose them as typed resources the agent can query or act on. The design aims to make IDE agents (e.g., Cursor, Claude Desktop) more “web-aware,” enabling workflows like reproducing a bug, collecting evidence, and proposing fixes without copy-pasting. Documentation and community guides outline a quick setup, including the extension, the MCP server process, and common troubleshooting steps. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 6
    Browserbase MCP Server

    Browserbase MCP Server

    Allow LLMs to control a browser with Browserbase and Stagehand

    ...The system supports multiple AI models and integrates seamlessly into agent workflows, making it suitable for applications such as web scraping, testing, and intelligent automation. It also includes advanced capabilities such as screenshot capture, DOM analysis, and session persistence, enabling complex interactions across multiple browsing sessions.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 7
    Magnitude

    Magnitude

    Vision AI browser agent for automation, testing, and extraction

    Browser Agent by Magnitude is an open source, vision-first browser automation framework that enables users to control web interfaces using natural language instructions. It leverages visually grounded AI models to interpret and interact with web pages based on what is seen on the screen rather than relying solely on the DOM structure. This approach allows the agent to generalize better across complex and modern websites, making it more robust than traditional selector-based automation tools. Browser Agent by Magnitude supports a wide range of capabilities including navigation, interaction, data extraction, and automated verification through built-in testing features. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 8
    Browser MCP

    Browser MCP

    Browser MCP is a Model Context Provider (MCP) server

    ...By adapting a Playwright-style approach to control the running browser profile, it reuses logged-in sessions and cookies, which reduces re-authentication friction and helps avoid some bot-detection heuristics. The server exposes structured tools for navigation, element interaction, and artifact capture (DOM, screenshots, logs), all discoverable via MCP schemas. Because it runs against the user’s primary browser, it’s well-suited to repetitive web tasks, authenticated dashboards, and debugging workflows inside MCP-capable IDEs. A public website and extension streamline installation and connect the local server to clients like Claude, Cursor, VS Code, and Windsurf. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 9
    canvas-editor

    canvas-editor

    Canvas-based WYSIWYG rich text editor with advanced layout tools

    canvas-editor is a browser-based rich text editor that renders content using HTML5 Canvas and SVG instead of traditional DOM-based approaches. It is designed to provide a WYSIWYG editing experience similar to word processors, enabling precise control over layout, rendering, and document structure. canvas-editor supports a wide range of formatting and document features, including text styling, tables, images, and embedded elements, all managed through a structured data model.
    Downloads: 8 This Week
    Last Update:
    See Project
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • 10
    FlowLens MCP

    FlowLens MCP

    Open-source MCP server that gives your coding agent

    ...It works together with a companion browser extension: when a user reproduces a bug or a complicated UI interaction, the extension captures a rich session log, including screen/video recording, network traffic, console logs, DOM events, storage changes, and more, and exports it. The MCP server then loads this captured “flow” and exposes it to the AI agent via the Model Context Protocol (MCP), letting the agent examine, search, filter, and reason about the session just as a human developer would, without needing the agent to re-run the flow or rely on minimal reproduction data (logs, screenshots).
    Downloads: 0 This Week
    Last Update:
    See Project
  • 11
    markstream-vue

    markstream-vue

    A Vue 3 renderer specifically built for AI-powered streaming Markdown

    ...It integrates advanced rendering support for complex content types such as code editors via Monaco, diagrams via Mermaid, and mathematical expressions via KaTeX, all optimized for incremental updates. The architecture is tailored for reactive front-end environments, leveraging Vue’s reactivity system to efficiently update only the necessary parts of the DOM.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    HyperAgent

    HyperAgent

    AI Browser Automation

    ...Instead of manually writing logic for clicking elements, extracting data, or navigating web pages, developers can instruct the agent in plain language and allow the AI layer to interpret and execute the task. This approach reduces the brittleness commonly associated with traditional automation scripts that break when the DOM structure changes. HyperAgent includes APIs such as page.ai() and page.extract() that allow structured data extraction and dynamic task execution through AI reasoning.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 13
    Chrome DevTools MCP

    Chrome DevTools MCP

    Chrome DevTools for coding agents

    chrome-devtools-mcp is an MCP server that connects AI agents to the Chrome DevTools Protocol so they can inspect pages, record traces, read console/network data, and modify the live browser state under user control. It makes a running Chrome instance visible to MCP clients, enabling agents to debug websites end-to-end—launching Chrome, navigating, profiling, and collecting artifacts in a structured way. The repository spells out environment requirements and cautions that exposing a live...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 14
    OmniParser

    OmniParser

    A simple screen parsing tool towards pure vision based GUI agent

    ...To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. Evaluations on benchmarks such as SeeClick, Mind2Web, and AITW demonstrate that OmniParser outperforms GPT-4V baselines, even when using only screenshot inputs without additional information.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    MCP UI

    MCP UI

    SDK for building interactive UI components over MCP for AI tools

    mcp-ui is a software development kit designed to bring interactive user interface capabilities to applications built on the Model Context Protocol (MCP). It enables developers to create rich, dynamic UI components that can be delivered from an MCP server and rendered seamlessly by a compatible client. Instead of returning only text responses, tools can provide structured UI resources such as HTML or remote-rendered components, allowing more engaging and functional interactions. mcp-ui...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    web-eval-agent MCP Server

    web-eval-agent MCP Server

    An MCP server that autonomously evaluates web applications

    web-eval-agent is a Model Context Protocol (MCP) server that spins up a browser-use–capable debugging agent to autonomously run and evaluate web apps straight from your editor. It’s positioned as a “let the coding agent debug itself” companion: the agent launches the app, navigates flows, captures evidence, and iterates on failures without manual copy-pasting of logs. The repository focuses on developer ergonomics, exposing typed MCP tools so clients like Claude Desktop can start sessions,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    OculiX

    OculiX

    Visual Automation IDE — automate anything you see on screen

    OculiX is the evolution of SikuliX, actively maintained with the full agreement of its original creator RaiMan. Automate any desktop application using image recognition (OpenCV) and OCR (Tesseract + PaddleOCR). No access to source code or DOM required — if you can see it, you can automate it. Key features: - Guided step-by-step recorder with live code preview - Image recognition via OpenCV 4.10 - Dual OCR: Tesseract (built-in) + PaddleOCR (neural, high precision) - Local and remote automation via integrated VNC - SSH tunnels via embedded JSch - Cross-platform: Windows, macOS (Apple Silicon M1-M4), Linux - Scripting: Jython, JRuby, Java, PowerShell, AppleScript - Java 17 recommended (Java 8+ supported) - Full CI/CD with automated builds for all platforms Used worldwide for test automation, RPA, and visual regression testing. ...
    Leader badge
    Downloads: 137 This Week
    Last Update:
    See Project
  • 18
    AI Employe

    AI Employe

    Create browser automation as if you were teaching a human using GPT-4

    ...To prevent GPT from derailing from tasks, we use a technique that is akin to retrieval-augmented generation, but we kind of call it Actions Augmented Generation. Essentially, when a user creates a workflow, we don't record the screen, microphone, or camera, but we do record the DOM element changes for every action (clicking, typing, etc.) the user takes.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    pdf-extractor

    pdf-extractor

    Node.js module for rendering pdf pages to images, svgs and HTML files

    ...This library is in it's most basic form a node.js wrapper for pdf.js. It has default renderers to generate a default output, but is easily extended to incorporate custom logic or to generate different output. It uses a node.js DOM and the node domstub from pdf.js do make pdf parsing available on node.js without a browser.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    html2canvas

    html2canvas

    A JavaScript HTML screenshot renderer

    html2canvas is a JavaScript HTML renderer. The script provides you with the tools to take screenshots of webpages directly on the browser. The screenshot is based on the DOM and therefore, it may not be 100% accurate to the real representation, given that it is not an actual screenshot, but a type of screenshot built based on the available data and information of the page. The script renders such page as a canvas image, by reading the DOM and the different styles of the featured elements. It doesn't require rendering from the server, given that the image is created on the user's browser. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 21
    Adaptative Backgrounds

    Adaptative Backgrounds

    A jQuery plugin for extracting the dominant color from images

    ...Ideally, this selector would start with img, to ensure we only grab and try to process actual images. parent falsy (default: null) a CSS selector which denotes which parent to apply the background color to. By default, the color is applied to the parent one level up the DOM tree.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
Auth0 Logo