vision free download - SourceForge

Self-Operating Computer

A framework to enable multimodal models to operate a computer

The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 13 This Week

Last Update: 2025-02-28

See Project

Open-AutoGLM

An open phone agent model & framework

...Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 4 This Week

Last Update: 2026-01-20

See Project

OmniParser

A simple screen parsing tool towards pure vision based GUI agent

OmniParser is a comprehensive method for parsing user interface screenshots into structured elements, significantly enhancing the ability of multimodal models like GPT-4 to generate actions accurately grounded in corresponding regions of the interface. It reliably identifies interactable icons within user interfaces and understands the semantics of various elements in a screenshot, associating intended actions with the correct screen regions. To achieve this, OmniParser curates an...

Downloads: 2 This Week

Last Update: 2025-09-09

See Project

Search Results for "vision"

Showing 3 open source projects for "vision"

Self-Operating Computer

Open-AutoGLM

OmniParser

Search Results for "vision"

Showing 3 open source projects for "vision"

Self-Operating Computer

Open-AutoGLM

OmniParser

Related Searches

Related Categories