Showing 56 open source projects for "mobile web browser"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 1
    Browser Use

    Browser Use

    Make websites accessible for AI agents

    Browser-Use is a framework that makes websites accessible for AI agents, enabling automated interactions and data extraction from web pages.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    Browser Use MCP Server

    Browser Use MCP Server

    Browse the web, directly from Cursor etc.

    A browser automation server implementing the Model Context Protocol, designed to allow AI assistants to browse the web directly from applications like Cursor. It supports natural language commands for web navigation and interaction. ​
    Downloads: 0 This Week
    Last Update:
    See Project
  • 3
    Stable Diffusion web UI for AMDGPUs

    Stable Diffusion web UI for AMDGPUs

    Stable Diffusion WebUI optimized for AMD GPUs with editing tools

    Stable Diffusion WebUI AMDGPU is a browser-based interface for generating images using Stable Diffusion, built with Gradio and adapted for AMD graphics hardware. It provides both text-to-image and image-to-image workflows, allowing users to create, refine, and upscale visuals within a single interface. It includes tools such as inpainting and outpainting for editing specific areas of an image, along with features like prompt matrix generation and attention controls to fine-tune outputs....
    Downloads: 8 This Week
    Last Update:
    See Project
  • 4
    web-eval-agent MCP Server

    web-eval-agent MCP Server

    An MCP server that autonomously evaluates web applications

    web-eval-agent is a Model Context Protocol (MCP) server that spins up a browser-use–capable debugging agent to autonomously run and evaluate web apps straight from your editor. It’s positioned as a “let the coding agent debug itself” companion: the agent launches the app, navigates flows, captures evidence, and iterates on failures without manual copy-pasting of logs.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 5
    LaVague

    LaVague

    Framework for building AI agents that automate complex web tasks

    LaVague is an open source framework designed to help developers build AI-powered web agents capable of automating tasks across websites and web applications. It implements the concept of a Large Action Model framework, allowing agents to interpret a user-provided objective and translate it into a sequence of actions performed in a browser. These agents can navigate web pages, retrieve information, fill out forms, and execute multi-step workflows automatically. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 6
    Index

    Index

    The SOTA Open-Source Browser Agent

    Index is an open-source browser automation agent designed to autonomously perform complex tasks across websites by transforming web interfaces into programmable APIs. The system enables developers to instruct an AI agent to interact with web pages using natural language rather than traditional automation scripts. Instead of writing detailed browser automation code, users can describe the desired task and allow the agent to interpret the page structure, interact with elements, and complete multi-step workflows automatically. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 7
    Notte

    Notte

    Opensource browser using agents

    Notte is an open-source browser framework that enables the development and deployment of web-based AI agents. It introduces a perception layer that transforms web pages into structured, navigable maps described in natural language, allowing agents to interact with the internet more effectively. Notte is designed for building scalable and efficient browser-based AI applications.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 8
    Android Use

    Android Use

    Automate native Android apps with AI using accessibility APIs

    android-action-kernel is an open source Python library designed to let AI agents control and automate native Android applications running on real devices or emulators. It fills a gap in automation tooling by focusing on mobile-first workflows where traditional browser or desktop-based automation doesn’t work; such as logistics, gig work, field operations, and other industries reliant on phones or tablets. The project works by using Android’s accessibility API to extract structured UI state (as XML) from the device, which is then fed to a large language model (LLM) like OpenAI’s models for decision-making, and actions are executed via the Android Debug Bridge (ADB). ...
    Downloads: 8 This Week
    Last Update:
    See Project
  • 9
    UI-TARS

    UI-TARS

    UI-TARS-desktop version that can operate on your local personal device

    UI-TARS is an open-source multimodal “GUI agent” created by ByteDance: a model designed to perceive raw screenshots (or rendered UI frames), reason about what needs to be done, and then perform real interactions with graphical user interfaces (GUIs) — like clicking, typing, navigating menus — across desktop, browser, mobile, or game environments. Rather than relying on rigid, manually scripted UI automation, UI-TARS uses a unified vision-language model (VLM) that integrates perception, reasoning, grounding, and action into one end-to-end framework: it “thinks before acting,” enabling flexible, general-purpose automation. This allows it to perform complex, multi-step tasks such as filling forms, downloading files, navigating applications, and even controlling in-game actions — all by understanding the UI as a human would. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • Add Two Lines of Code. Get Full APM. Icon
    Add Two Lines of Code. Get Full APM.

    AppSignal installs in minutes and auto-configures dashboards, alerts, and error tracking.

    Works out of the box for Rails, Django, Express, Phoenix, and more. Monitoring exceptions and performance in no time.
    Start Free
  • 10
    AskUI Vision Agent

    AskUI Vision Agent

    Enable AI to control your desktop, mobile and HMI devices

    AskUI’s Vision Agent is an automation framework that allows you—and AI agents—to control real desktops, mobile devices, and HMI systems by perceiving the UI and performing actions like clicking, typing, scrolling, and drag-and-drop. It is designed for multi-platform compatibility and supports multiple AI models so you can tailor perception and decision-making to your workload. The repository presents a feature overview, sample media, and frequent release notes, which show ongoing...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 11
    Omnara

    Omnara

    Talk to Your AI Agents from Anywhere

    Omnara is an open-source agent control platform that empowers developers to turn autonomous AI tools (e.g., Claude Code, Cursor, GitHub Copilot) into collaborative teammates by offering real-time dashboards, push notifications, and remote guidance across terminals, web, and mobile. Omnara transforms your AI agents (Claude Code, Codex CLI, n8n, and more) from silent workers into communicative teammates. Get real-time visibility into what your agents are doing, and respond to their questions instantly from a single dashboard on web and mobile. The primary way to use CLI coding agents (Claude Code, Codex CLI) with Omnara.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    BrowserGym

    BrowserGym

    A Gym environment for web task automation

    BrowserGym is an open framework for web task automation research that exposes browser interaction as a Gym-style environment for training and evaluating agents. It is intended for researchers building web agents rather than for end users looking for a consumer automation product. The project provides a common environment where agents can interact with websites, execute tasks, and be evaluated against standardized benchmarks.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    CogAgent

    CogAgent

    An open sourced end-to-end VLM-based GUI Agent

    CogAgent is a 9B-parameter bilingual vision-language GUI agent model based on GLM-4V-9B, trained with staged data curation, optimization, and strategy upgrades to improve perception, action prediction, and generalization across tasks. It focuses on operating real user interfaces from screenshots plus text, and follows a strict input–output format that returns structured actions, grounded operations, and optional sensitivity annotations. The model is designed for agent-style execution rather...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    MolmoWeb

    MolmoWeb

    Open multimodal web agent built by Ai2

    MolmoWeb is an open-source multimodal web agent designed to autonomously navigate and interact with web browsers using vision-language models, representing a significant step toward fully agentic AI systems that can operate in real-world digital environments. The system takes natural language instructions and translates them into sequences of browser actions such as clicking, typing, scrolling, and navigating, effectively performing tasks on behalf of the user. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    Khoj

    Khoj

    An AI personal assistant for your digital brain

    ...Khoj is a desktop application to search and chat with your notes, documents, and images. It is an offline-first, open-source AI personal assistant that is accessible from Emacs, Obsidian or your Web browser. Khoj is a thinking tool that is transparent, fun, and easy to engage with. You can build faster and better by using Khoj to search and reason across all your data sources. Khoj learns from your notes and documents to function as an extension of your brain. So that you can stay focused on doing what matters. Khoj started with the founding principle that a personal assistant be understandable, accessible and hackable. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 16
    Whisper-WebUI

    Whisper-WebUI

    A Web UI for easy subtitle using whisper model

    Whisper WebUI is an open-source browser-based interface that simplifies the use of Whisper speech recognition models by providing an intuitive graphical environment for transcription, translation, and subtitle generation. Built with Gradio, it allows users to upload audio or video files, process them locally, and generate accurate text outputs without relying on command-line tools. The platform integrates optimized implementations such as faster-whisper, significantly improving transcription...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 17
    ChatTTS webUI & API

    ChatTTS webUI & API

    A simple native web interface that uses ChatTTS to synthesize text

    ChatTTS-ui is a local web interface and API wrapper around the ChatTTS speech synthesis system, designed to make advanced TTS models easy to use from a browser. It runs a small backend server (Python + Torch + ffmpeg) and exposes a simple webpage where you can type text, adjust parameters, and generate audio. The project supports Chinese, English, and mixed text with digits and control symbols, making it suitable for bilingual content and numerically heavy text like announcements or prompts. ...
    Downloads: 12 This Week
    Last Update:
    See Project
  • 18
    MLC LLM

    MLC LLM

    Universal LLM Deployment Engine with ML Compilation

    MLC LLM is a machine learning compiler and deployment framework designed to enable efficient execution of large language models across a wide range of hardware platforms. The project focuses on compiling models into optimized runtimes that can run natively on devices such as GPUs, mobile processors, browsers, and edge hardware. By leveraging machine learning compilation techniques, mlc-llm produces high-performance inference engines that maintain consistent APIs across platforms. The system supports deployment on environments including Linux, macOS, Windows, iOS, Android, and web browsers while utilizing different acceleration technologies such as CUDA, Vulkan, Metal, and WebGPU. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 19
    InvokeAI

    InvokeAI

    InvokeAI is a leading creative engine for Stable Diffusion models

    InvokeAI is an implementation of Stable Diffusion, the open source text-to-image and image-to-image generator. It provides a streamlined process with various new features and options to aid the image generation process. It runs on Windows, Mac and Linux machines, and runs on GPU cards with as little as 4 GB or RAM. InvokeAI is a leading creative engine built to empower professionals and enthusiasts alike. Generate and create stunning visual media using the latest AI-driven technologies....
    Downloads: 19 This Week
    Last Update:
    See Project
  • 20
    Agent Development Kit (ADK)

    Agent Development Kit (ADK)

    Open-source, code-first Python toolkit for building, evaluating, etc.

    ADK (Android Device Key) Python is a reference implementation by Google for working with Android attestation keys in Python. It facilitates the integration of Android attestation features into backends or systems that require verification of device identity and integrity. This is especially important in high-security applications where verifying that a device is genuine and uncompromised is critical. ADK Python helps developers verify hardware-backed keys, work with JSON Web Tokens (JWT),...
    Downloads: 7 This Week
    Last Update:
    See Project
  • 21
    clone-voice

    clone-voice

    A sound cloning tool with a web interface, using your voice

    ...It is built around Coqui’s XTTS-v2 model, so it inherits multilingual support and modern neural TTS quality while wrapping it in a user-friendly desktop workflow. The app is designed to be very easy to use: you download a precompiled package, double-click app.exe, and it launches a browser-based web interface where you control cloning and synthesis. It does not require an NVIDIA GPU to run basic tasks, although GPU acceleration can be used when available, making it accessible on modest machines. The tool supports around sixteen languages, including Chinese, English, Japanese, Korean, French, German, Italian, and others, and can capture reference voices directly from a microphone or from uploaded audio.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 22
    Every Code

    Every Code

    Local AI coding agent CLI with multi-agent orchestration tools

    Every Code (often referred to simply as Code) is a fast, local AI-powered coding agent designed to run directly in the terminal environment. It is a community-driven fork of the Codex CLI, with a strong emphasis on improving real-world developer ergonomics and workflows. Every Code enhances the traditional coding assistant model by introducing multi-agent orchestration, allowing multiple AI agents to collaborate, compare solutions, and refine outputs in parallel. It supports integration with...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 23
    Rhino

    Rhino

    On-device Speech-to-Intent engine powered by deep learning

    ...Create use-case-specific voice AI models in seconds. Develop voice features with a few lines of code using intuitive and cross-platform SDKs. Deliver voice AI everywhere: on-device, mobile, web browsers, on-premise, or cloud. Measure adoption, learn, and iterate. Continuously re-design and re-train to optimize engagement. Building accurate, responsive, and private voice technology is difficult. We learned the hard way, so you don’t have to. Picovoice heavily invests in R&D to offer superior voice AI that surpasses even Big Tech in accuracy and efficiency. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 24
    edge-tts

    edge-tts

    Use Microsoft Edge's online text-to-speech service from Python

    ...From the CLI you can adjust parameters such as speaking rate, volume, and pitch, giving you some control over prosody without diving into SSML. The library is asynchronous under the hood, which makes it efficient for batch jobs or web services that need to synthesize many utterances concurrently.
    Downloads: 17 This Week
    Last Update:
    See Project
  • 25
    StableSwarmUI

    StableSwarmUI

    Multi-user UI for managing and running Stable Diffusion workflows tool

    StableSwarmUI is a web-based interface designed to manage and coordinate Stable Diffusion image generation workflows in a multi-user environment. It focuses on enabling multiple users to interact with shared resources, making it suitable for collaborative or server-based deployments. It provides a centralized system where users can submit, monitor, and manage generation tasks through a browser interface.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • Next
MongoDB Logo MongoDB