visual-mingw free download

Self-Operating Computer

A framework to enable multimodal models to operate a computer

...Notably, it was the first known project to implement a multimodal model capable of viewing and controlling a computer screen. The framework supports features like Optical Character Recognition (OCR) and Set-of-Mark (SoM) prompting to enhance visual grounding capabilities. It is designed to be compatible with macOS, Windows, and Linux (with X server installed), and is released under the MIT license.

1 Review

Downloads: 2 This Week

Last Update: 2025-02-28

See Project

Agent S

Agent S: an open agentic framework that uses computers like a human

...The latest version, Agent S3, surpasses human-level performance on the OSWorld benchmark, demonstrating state-of-the-art results in complex multi-step computer tasks. Agent S combines powerful foundation models (such as GPT-5) with grounding models like UI-TARS to translate visual inputs into precise executable actions. It supports flexible deployment via CLI, SDK, or cloud, and integrates with multiple model providers including OpenAI, Anthropic, Gemini, Azure, and Hugging Face endpoints. With optional local code execution, reflection mechanisms, and compositional planning, Agent S provides a scalable and research-driven framework for building advanced computer-use agents.

Downloads: 10 This Week

Last Update: 2025-12-16

See Project

NVIDIA AI Blueprint

Suite of reference architectures for building GPU-accelerated vision

...The project is organized around real-time video intelligence, downstream analytics, and agentic offline processing. It supports workflows such as natural-language video search, visual question answering, long-video summarization, clip retrieval, verified alerts, and incident analysis. It is designed for technical users who need deployable reference architectures for smart spaces, warehouse automation, SOP validation, monitoring, and operational video analytics. The repository includes Python agent code, Docker Compose deployment configurations, skills, scripts, and a Next.js-based UI.

Downloads: 1 This Week

Last Update: 2026-05-14

See Project

Android Use

Automate native Android apps with AI using accessibility APIs

android-action-kernel is an open source Python library designed to let AI agents control and automate native Android applications running on real devices or emulators. It fills a gap in automation tooling by focusing on mobile-first workflows where traditional browser or desktop-based automation doesn’t work; such as logistics, gig work, field operations, and other industries reliant on phones or tablets. The project works by using Android’s accessibility API to extract structured UI state...

Downloads: 3 This Week

Last Update: 6 days ago

See Project

ComfyUI-HunyuanVideoWrapper

ComfyUI wrapper nodes for HunyuanVideo

...The system introduces specialized nodes such as text-image encoders that allow multiple image inputs to be referenced directly within prompts. This makes it possible to guide generation using both visual and textual context simultaneously. The wrapper is designed to fit seamlessly into ComfyUI pipelines, enabling chaining with other nodes for advanced workflows. It supports prompt-based referencing of images, where placeholders in text correspond to connected inputs, allowing fine control over generation behavior. The project is particularly useful for creators experimenting with multimodal AI video synthesis.

Downloads: 0 This Week

Last Update: 2026-04-16

See Project

Agent Sprite Forge

Agent Skill for generating 2D sprite sheets and map, transparent PNG

...The system supports multi-frame sprite generation, animation sequencing, and transparent background rendering for easier integration into game engines. Its architecture is designed around automation and repeatability, enabling developers to generate large batches of visual assets through structured prompt workflows. Overall, agent-sprite-forge acts as an AI-assisted creative tool for accelerating 2D game art production and experimentation.

Downloads: 0 This Week

Last Update: 2026-05-08

See Project

MolmoWeb

Open multimodal web agent built by Ai2

...Unlike traditional automation tools that rely on structured HTML parsing or predefined APIs, MolmoWeb operates directly from screenshots of web pages, interpreting visual content in the same way a human user would. This approach allows it to generalize across different websites without requiring site-specific integrations, making it highly adaptable to diverse web environments.

Downloads: 0 This Week

Last Update: 6 days ago

See Project

ticket

Fast, powerful, git-native ticket tracking in a single bash script

...It stores each ticket as a Markdown file with YAML frontmatter, making them human-readable and easy to version control alongside your code, while also allowing IDEs to jump straight to ticket definitions. The CLI provides common subcommands to create, list, edit, close, and manage dependencies between tickets, enabling clear hierarchical task structures and visual dependency trees. Its design is rooted in the Unix philosophy of simplicity, composability, and transparency, meaning it integrates well with other standard tools like grep, jq, and ripgrep when installed. Teams can use ticket to track bugs, features, chores, and epics with priority levels and tags, all by staying within the terminal and Git ecosystem.

Downloads: 0 This Week

Last Update: 2026-02-03

See Project

airda

airda(Air Data Agent

airda(Air Data Agent) is a multi-smart body for data analysis, capable of understanding data development and data analysis needs, understanding data, generating data-oriented queries, data visualization, machine learning and other tasks of SQL and Python codes.

Downloads: 2 This Week

Last Update: 2024-09-03

See Project

Universe Starter Agent

A starter agent that can solve a number of universe environments

The universe-starter-agent repository is an archived OpenAI codebase designed as a starter reinforcement-learning agent that can interact with and solve tasks in OpenAI’s Universe environment platform. Its purpose is to serve as a baseline or reference implementation so researchers or developers can see how to build agents that operate in real-time, visual environments (e.g., games, browser apps) via pixel observations and keyboard/mouse actions. Under the hood, this starter agent implements a version of the A3C (Asynchronous Advantage Actor-Critic) algorithm, adapted for the specific challenges of Universe environments (e.g., network latency, VNC streaming, asynchronous observations). ...

Downloads: 0 This Week

Last Update: 2025-10-03

See Project

Search Results for "visual-mingw"

Showing 10 open source projects for "visual-mingw"

Self-Operating Computer

Agent S

NVIDIA AI Blueprint

Android Use

ComfyUI-HunyuanVideoWrapper

Agent Sprite Forge

MolmoWeb

ticket

airda

Universe Starter Agent

Search Results for "visual-mingw"

Showing 10 open source projects for "visual-mingw"

Self-Operating Computer

Agent S

NVIDIA AI Blueprint

Android Use

ComfyUI-HunyuanVideoWrapper

Agent Sprite Forge

MolmoWeb

ticket

airda

Universe Starter Agent

Related Searches

Related Categories