Showing 135 open source projects for "graphical user interfaces"

View related business solutions
  • MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    TypeChat

    TypeChat

    Library for building type-safe natural language interfaces with LLMs

    TypeChat is an open source library developed by Microsoft that simplifies the creation of natural language interfaces by using type definitions to structure interactions with large language models. Traditional natural language interfaces often relied on complex decision trees to interpret user intent and gather required inputs. With the rise of large language models, developers can interpret user requests more easily, but they still face challenges related to output reliability, safety, and structured responses. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 2
    Umi-OCR

    Umi-OCR

    OCR software, free and offline

    ...The software supports flexible usage patterns including screenshot capture OCR, batch processing of large sets of images or documents, PDF parsing, QR code detection, and layout-aware paragraph output. Users can interact with Umi-OCR through a graphical interface, command-line options, or HTTP interfaces, making it adaptable to both casual desktop usage and programmatic automation. Because the project is open source, developers can inspect, modify, and extend its capabilities, and plugins allow for different recognition engines or enhanced features.
    Downloads: 46 This Week
    Last Update:
    See Project
  • 3
    MemMachine

    MemMachine

    Universal memory layer for AI Agents

    ...Unlike ephemeral LLM prompt state, MemMachine supports distinct memory types—short-term conversational context, long-term persistent knowledge, and profile memory for personalized facts—persisted in optimized stores (e.g., graph databases for episodic lines of reasoning and SQL for user facts) to support robust, context-aware intelligence in agents. It offers flexible APIs, a Python SDK, REST interfaces, and MCP (Model Context Protocol) connectivity to integrate seamlessly with agent frameworks receiving and storing memories over time, effectively boosting relevance, continuity, and tailored behavior.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    autoMate

    autoMate

    AI tool for automating desktop tasks via natural language input

    autoMate is an AI-powered local automation tool designed to enable users to control and automate their computers using natural language instructions instead of traditional scripting or rule-based systems. It combines large language models with computer vision techniques to interpret user intent and understand on-screen content, allowing it to interact with graphical interfaces similarly to a human user. autoMate follows an observe-decide-act workflow, where it analyzes the screen, plans actions, and executes them through simulated input such as mouse clicks and keyboard events. Unlike conventional RPA tools that require predefined workflows, autoMate dynamically adapts to tasks by making autonomous decisions based on the current interface state. autoMate emphasizes local execution, meaning all processing happens on the user’s machine to maintain privacy and data security.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Auth0 B2B Essentials: SSO, MFA, and RBAC Built In Icon
    Auth0 B2B Essentials: SSO, MFA, and RBAC Built In

    Unlimited organizations, 3 enterprise SSO connections, role-based access control, and pro MFA included. Dev and prod tenants out of the box.

    Auth0's B2B Essentials plan gives you everything you need to ship secure multi-tenant apps. Unlimited orgs, enterprise SSO, RBAC, audit log streaming, and higher auth and API limits included. Add on M2M tokens, enterprise MFA, or additional SSO connections as you scale.
    Sign Up Free
  • 5
    npcpy

    npcpy

    The AI toolkit for the AI developer

    npcpy is a Python-based agent framework and command-line toolkit (the NPC Shell) for developers to build, test, and integrate AI agents into their workflows, including both command-line and GUI interfaces via NPC Studio. Welcome to npcpy, the core library of the NPC Toolkit that supercharges natural language processing pipelines and agent tooling. npcpy is a flexible framework for building state-of-the-art applications and conducting novel research with LLMs. The structure of npcpy also...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Windows-MCP

    Windows-MCP

    MCP server enabling AI agents to control and automate Windows OS

    ...It includes a set of tools that simulate user inputs like keyboard and mouse actions while also capturing the current state of windows and interfaces. It is designed to be extensible and adaptable, allowing developers to customize or expand its functionality for different automation or AI use cases.
    Downloads: 3 This Week
    Last Update:
    See Project
  • 7
    Agent S

    Agent S

    Agent S: an open agentic framework that uses computers like a human

    Agent S is an open-source agentic framework designed to enable autonomous computer use through an Agent-Computer Interface (ACI). Built to operate graphical user interfaces like a human, it allows AI agents to perceive screens, reason about tasks, and execute actions across macOS, Windows, and Linux systems. The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. ...
    Downloads: 3 This Week
    Last Update:
    See Project
  • 8
    Open-AutoGLM

    Open-AutoGLM

    An open phone agent model & framework

    Open-AutoGLM is an open-source framework and model designed to empower autonomous mobile intelligent assistants by enabling AI agents to understand and interact with phone screens in a multimodal manner, blending vision and language capability to control real devices. It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    AppAgent

    AppAgent

    Multimodal Agents as Smartphone Users, an LLM-based multimodal agent

    AppAgent is an open-source multimodal agent framework designed to enable large language models to operate smartphone applications through natural interactions with graphical user interfaces. The system allows an AI agent to interpret visual information from the screen and translate natural language instructions into actions such as tapping, swiping, and navigating between application screens. Instead of requiring backend access to application APIs, the framework interacts with apps the same way a human user would, making it compatible with a wide variety of mobile applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    Hermes Agent

    Hermes Agent

    The agent that grows with you

    ...Rather than functioning as a stateless chatbot, it maintains long-term memory across sessions and can generate searchable “Skill Documents” that capture how it solved complex tasks so it doesn’t start from scratch each time. The agent interfaces with messaging platforms like Telegram, Discord, Slack, and WhatsApp through a single gateway process, and also offers an interactive terminal user interface with history, autocomplete, and streamable tool output. It supports scheduled automation in natural language, allowing users to set up recurring tasks such as daily briefings or system audits that it runs unattended.
    Downloads: 10 This Week
    Last Update:
    See Project
  • 11
    Operit AI

    Operit AI

    Powerful Android AI agent with tools, automation, and Linux shell

    ...Operit supports both local and remote AI models, including offline execution through frameworks like llama.cpp and MNN, helping preserve user privacy while maintaining flexibility. Operit also includes an intelligent memory system that stores, organizes, and retrieves user interactions to provide more personalized and context-aware responses. In addition, it offers workflow automation, plugin extensibility, & a rich tool ecosystem, making it suitable for advanced productivity.
    Downloads: 28 This Week
    Last Update:
    See Project
  • 12
    Devon

    Devon

    Open source AI pair programmer for coding, debugging, automation

    ...It operates as an agent-based system that can explore codebases, edit files, and execute development workflows with minimal manual intervention. Devon uses a client-server architecture with a Python backend and multiple user interfaces, including a terminal interface and an Electron-based desktop application. Devon integrates with multiple large language models, allowing users to choose between different providers for performance, cost, and latency considerations. It is capable of performing tasks such as debugging, writing tests, analyzing code structure, and navigating complex repositories. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    MAI-UI

    MAI-UI

    Real-World Centric Foundation GUI Agents

    MAI-UI is a cutting-edge open-source project that implements a family of foundation GUI (Graphical User Interface) agent models capable of interpreting natural language and performing real-world GUI navigation and control tasks across mobile and desktop environments. Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 14
    OpenAdapt

    OpenAdapt

    Open Source Generative Process Automation

    OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web Graphical User Interfaces (GUIs). OpenAdapt learns to automate your desktop and web workflows by observing your demonstrations. Spend less time on repetitive tasks and more on work that truly matters. Boost team productivity in HR operations. Automate candidate sourcing using LinkedIn Recruiter, LinkedIn Talent Solutions, GetProspect, Reply.io, outreach.io, Gmail/Outlook, and more. ...
    Downloads: 1 This Week
    Last Update:
    See Project
  • 15
    Sunfish

    Sunfish

    Sunfish: a Python Chess Engine in 111 lines of code

    ...The project is often used as an educational tool for understanding game AI, search algorithms, and evaluation functions without the complexity of larger engines. It includes a simple UCI-compatible interface, allowing it to be integrated with graphical chess interfaces or used in terminal-based gameplay. The codebase is intentionally minimal, making it ideal for experimentation, modification, and learning purposes.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 16
    UFO³

    UFO³

    Weaving the Digital Agent Galaxy

    UFO is an open-source framework developed by Microsoft for building intelligent agents that automate interactions with graphical user interfaces on the Windows operating system. The system allows users to issue natural language instructions that are translated into automated actions across multiple desktop applications. Using a dual-agent architecture, the framework analyzes both visual interface elements and system control structures in order to understand how applications should be manipulated. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    UI-TARS

    UI-TARS

    UI-TARS-desktop version that can operate on your local personal device

    UI-TARS is an open-source multimodal “GUI agent” created by ByteDance: a model designed to perceive raw screenshots (or rendered UI frames), reason about what needs to be done, and then perform real interactions with graphical user interfaces (GUIs) — like clicking, typing, navigating menus — across desktop, browser, mobile, or game environments. Rather than relying on rigid, manually scripted UI automation, UI-TARS uses a unified vision-language model (VLM) that integrates perception, reasoning, grounding, and action into one end-to-end framework: it “thinks before acting,” enabling flexible, general-purpose automation. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 18
    qxresearch-event-1

    qxresearch-event-1

    Python hands on tutorial with 50+ Python Application

    qxresearch-event-1 is an open-source educational repository that provides a collection of lightweight Python applications designed to demonstrate programming concepts and artificial intelligence techniques in simple and accessible examples. The repository contains dozens of small programs, many implemented with minimal lines of code, covering topics such as machine learning, graphical user interfaces, computer vision, and API integration. Each example is designed to illustrate a single concept or application in a clear and concise manner so that learners can quickly understand the underlying logic. The project emphasizes practical experimentation, allowing beginners to modify and extend the example programs to explore new ideas. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    Vanna 2.0

    Vanna 2.0

    Chat with your SQL database

    ...The system streams query results, visualizations, and summaries directly to user interfaces, allowing non-technical users to interact with complex data systems through conversational queries. It also includes enterprise-grade features such as user-aware security, permission enforcement, and query auditing for production deployments.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 20
    GELab-Zero

    GELab-Zero

    GUI Exploration Lab. One of the best GUI agent solutions

    GELab-Zero is an open-source “GUI Agent” framework aiming to automate interactions with graphical user interfaces (GUIs), combining both the agent model and all supporting infrastructure — including inference, input orchestration, and GUI automation logic — in a plug-and-play package that runs locally, without cloud dependencies. The idea is to let developers or users harness an AI agent that can simulate clicking, typing, reading UI elements, and interacting with apps in a human-like way via the GUI, which can enable tasks like automated testing, scriptable workflows, or even autonomous usage of GUI-based applications. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    KIS Open API

    KIS Open API

    Korea Investment & Securities Open API Github

    ...It includes example scripts that demonstrate how to authenticate with the service, retrieve financial data, and execute trading operations through REST and WebSocket interfaces. The repository organizes its examples into two main groups: code designed for direct user implementation and simplified examples intended for large language model agents or automation workflows.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 22
    Cheshire Cat AI

    Cheshire Cat AI

    AI agent microservice

    ...Cheshire Cat also supports multi-user environments with granular permissions and identity provider integration, making it suitable for enterprise use cases.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    MCP UI

    MCP UI

    SDK for building interactive UI components over MCP for AI tools

    mcp-ui is a software development kit designed to bring interactive user interface capabilities to applications built on the Model Context Protocol (MCP). It enables developers to create rich, dynamic UI components that can be delivered from an MCP server and rendered seamlessly by a compatible client. Instead of returning only text responses, tools can provide structured UI resources such as HTML or remote-rendered components, allowing more engaging and functional interactions. mcp-ui introduces a standardized approach where tools and their associated interfaces are linked through metadata, enabling clients to automatically discover and display the correct UI. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 24
    OmniParser

    OmniParser

    A simple screen parsing tool towards pure vision based GUI agent

    OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    MONAI

    MONAI

    AI Toolkit for Healthcare Imaging

    ...It is built on top of PyTorch and is released under the Apache 2.0 license. Aiming to capture best practices of AI development for healthcare researchers, with an immediate focus on medical imaging. Providing user-comprehensible error messages and easy to program API interfaces. Provides reproducibility of research experiments for comparisons against state-of-the-art implementations.
    Downloads: 5 This Week
    Last Update:
    See Project
MongoDB Logo MongoDB