Gemini 2.5 Computer Use
Introducing the Gemini 2.5 Computer Use model, a specialized agent model built on top of Gemini 2.5 Pro’s visual reasoning capabilities, designed to interact directly with user interfaces (UIs). It is exposed via a new computer-use tool in the Gemini API, with inputs that include the user’s request, a screenshot of the UI environment, and a history of recent actions. The model generates function calls corresponding to UI actions like clicking, typing, or selecting, and may request user confirmation for higher-risk tasks. After each action is executed, a new screenshot and URL are fed back into the model to continue the loop until the task completes or is halted. It is optimized primarily for web browser control and shows promise for mobile UI interaction, though it is not yet suited for desktop OS-level control. In benchmarks across web and mobile control tasks, Gemini 2.5 Computer Use outperforms leading alternatives, delivering high accuracy at lower latency.
Learn more
Surfer H
Surfer H from H Company is an autonomous web-agent platform built to understand and navigate user interfaces like a human by combining three modular models; a policy model that plans tasks, a localizer model that identifies UI elements visually, and a validator model that checks outcomes. The agent works purely through the browser interface with no special API hooks, enabling it to scroll, click, type, and complete real-web tasks such as booking hotels, comparing product deals, or extracting structured information. When paired with H Company’s open-weight vision-language models, Surfer H achieved state-of-the-art performance on the WebVoyager benchmark (92.2% accuracy at around $0.13 per task) and supports deployment locally, via Docker, or on cloud infrastructure. Use cases span web automation, QA testing without brittle scripts, data harvesting, and intelligent workflow agents that interact with the web directly as a human would.
Learn more
OpenClaw
OpenClaw is an open source autonomous personal AI assistant agent you run on your own computer, server, or VPS that goes beyond just generating text by actually performing real tasks you tell it to do in natural language through familiar chat platforms like WhatsApp, Telegram, Discord, Slack, and others. It connects to external large language models and services while prioritizing local-first execution and data control on your infrastructure so the agent can clear your inbox, send emails, manage your calendar, check you in for flights, interact with files, run scripts, and automate everyday workflows without needing predefined triggers or cloud-hosted assistants; it maintains persistent memory (remembering context across sessions) and can run continuously to proactively coordinate tasks and reminders. It supports integrations with messaging apps and community-built “skills,” letting users extend its capabilities and route different agents or tools through isolated workspaces.
Learn more
BLACKBOX AI
BLACKBOX AI is an advanced AI-powered platform designed to accelerate coding, app development, and deep research tasks. It features an AI Coding Agent that supports real-time voice interaction, GPU acceleration, and remote parallel task execution. Users can convert Figma designs into functional code and transform images into web applications with minimal coding effort. The platform enables screen sharing within IDEs like VSCode and offers mobile access to coding agents. BLACKBOX AI also supports integration with GitHub repositories for streamlined remote workflows. Its capabilities extend to website design, app building with PDF context, and image generation and editing.
Learn more