Showing 3 open source projects for "screen"

View related business solutions
  • Outgrown Windows Task Scheduler? Icon
    Outgrown Windows Task Scheduler?

    Free diagnostic identifies where your workflow is breaking down—with instant analysis of your scheduling environment.

    Windows Task Scheduler wasn't built for complex, cross-platform automation. Get a free diagnostic that shows exactly where things are failing and provides remediation recommendations. Interactive HTML report delivered in minutes.
    Download Free Tool
  • Retool your internal operations Icon
    Retool your internal operations

    Generate secure, production-grade apps that connect to your business data. Not just prototypes, but tools your team can actually deploy.

    Build internal software that meets enterprise security standards without waiting on engineering resources. Retool connects to your databases, APIs, and data sources while maintaining the permissions and controls you need. Create custom dashboards, admin tools, and workflows from natural language prompts—all deployed in your cloud with security baked in. Stop duct-taping operations together, start building in Retool.
    Build an app in Retool
  • 1
    Self-Operating Computer

    Self-Operating Computer

    A framework to enable multimodal models to operate a computer

    The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 2
    MAI-UI

    MAI-UI

    Real-World Centric Foundation GUI Agents

    ...Developed by Tongyi-MAI (Alibaba’s research initiative), the MAI-UI models are multimodal agents trained to understand user instructions and corresponding screenshots, grounding those instructions to on-screen elements and generating sequences of GUI actions such as taps, swipes, text input, and system commands. Unlike traditional UI frameworks, MAI-UI emphasizes realistic deployment by supporting agent–user interaction (clarifying ambiguous instructions), integration with external tool APIs using MCP calls, and a device–cloud collaboration mechanism that dynamically routes computation to on-device or cloud models based on task state and privacy constraints.
    Downloads: 13 This Week
    Last Update:
    See Project
  • 3
    OAGI Python SDK

    OAGI Python SDK

    Python SDK for the Computer Use model Lux, developed by OpenAGI

    ...It exposes the OAGI API in an ergonomic way, letting you trigger Lux in three main modes: Tasker for precise scripted sequences, Actor for fast one-shot tasks, and Thinker for open-ended, multi-step objectives. The SDK is designed around “computer use” as a paradigm, where the AI actually navigates interfaces, clicks, types, scrolls, and reads the screen through screenshots instead of only calling APIs. It provides high-level asynchronous agents (like AsyncDefaultAgent and AsyncActor) that encapsulate the loop of capturing screenshots, sending them to Lux, interpreting responses, and executing UI actions with PyAutoGUI. Multiple installation flavors let you choose between a minimal oagi-core package or variants that bundle desktop automation and FastAPI/Socket.IO server capabilities.
    Downloads: 4 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next