vision free download - SourceForge

Self-Operating Computer

A framework to enable multimodal models to operate a computer

The Self-Operating Computer Framework is an innovative system that enables multimodal models to autonomously operate a computer by interpreting the screen and executing mouse and keyboard actions to achieve specified objectives. This framework is compatible with various multimodal models and currently integrates with GPT-4o, o1, Gemini Pro Vision, Claude 3, and LLaVa. Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 13 This Week

Last Update: 2025-02-28

See Project

Open-AutoGLM

An open phone agent model & framework

...Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 4 This Week

Last Update: 2026-01-20

See Project

UI-TARS Desktop

A GUI Agent app based on UI-TARS to control your computer using AI

UI-TARS Desktop is a graphical user interface (GUI) agent application that leverages the UI-TARS vision-language model to enable natural language control of computers. This cross-platform tool supports both Windows and macOS, allowing users to perform tasks through intuitive commands. Key features include screenshot-based visual recognition, precise mouse and keyboard control, and real-time feedback on actions. Provides immediate responses and visual feedback on actions performed. ...

1 Review

Downloads: 66 This Week

Last Update: 2025-11-04

See Project

Search Results for "vision"

Showing 3 open source projects for "vision"

Self-Operating Computer

Open-AutoGLM

UI-TARS Desktop

Search Results for "vision"

Showing 3 open source projects for "vision"

Self-Operating Computer

Open-AutoGLM

UI-TARS Desktop

Related Searches

Related Categories