visual free download - SourceForge

Self-Operating Computer

A framework to enable multimodal models to operate a computer

...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 4 This Week

Last Update: 2025-02-28

See Project

Open-AutoGLM

An open phone agent model & framework

...It aims to create an “AI phone agent” that can perceive on-screen content, reason about user goals, and execute sequences of taps, swipes, and text input via automated device control interfaces like ADB, enabling hands-off completion of multi-step tasks such as navigating apps, filling forms, and more. Unlike traditional automation scripts that depend on brittle heuristics, Open-AutoGLM uses pretrained large language and vision-language models to interpret visual context and natural language instructions, giving the agent robust adaptability across apps and interfaces.

Downloads: 5 This Week

Last Update: 4 days ago

See Project

airda

airda(Air Data Agent

airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.

Downloads: 0 This Week

Last Update: 2024-09-03

See Project

Search Results for "visual"

Showing 3 open source projects for "visual"

Self-Operating Computer

Open-AutoGLM

airda

Search Results for "visual"

Showing 3 open source projects for "visual"

Self-Operating Computer

Open-AutoGLM

airda

Related Searches

Related Categories