screen free download - SourceForge

Self-Operating Computer

A framework to enable multimodal models to operate a computer

The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen.

1 Review

Downloads: 7 This Week

Last Update: 2025-02-28

See Project

OmniParser

A simple screen parsing tool towards pure vision based GUI agent

...It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an interactable icon detection dataset containing 67,000 unique screenshot images labeled with bounding boxes of interactable icons derived from DOM trees. Additionally, a collection of 7,000 icon-description pairs is used to fine-tune a caption model that extracts the functional semantics of detected elements. ...

Downloads: 2 This Week

Last Update: 2025-09-09

See Project

Open-AutoGLM

An open phone agent model & framework

Open-AutoGLM is an open-source framework and model designed to empower autonomous mobile intelligent assistants by enabling AI agents to understand and interact with phone screens in a multimodal manner, blending vision and language capability to control real devices. It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 0 This Week

Last Update: 15 hours ago

See Project

Search Results for "screen"

Showing 3 open source projects for "screen"

Self-Operating Computer

OmniParser

Open-AutoGLM

Search Results for "screen"

Showing 3 open source projects for "screen"

Self-Operating Computer

OmniParser

Open-AutoGLM

Related Searches

Related Categories