UI-TARS

UI-TARS is an open-source multimodal “GUI agent” created by ByteDance: a model designed to perceive raw screenshots (or rendered UI frames), reason about what needs to be done, and then perform real interactions with graphical user interfaces (GUIs) — like clicking, typing, navigating menus — across desktop, browser, mobile, or game environments. Rather than relying on rigid, manually scripted UI automation, UI-TARS uses a unified vision-language model (VLM) that integrates perception, reasoning, grounding, and action into one end-to-end framework: it “thinks before acting,” enabling flexible, general-purpose automation. This allows it to perform complex, multi-step tasks such as filling forms, downloading files, navigating applications, and even controlling in-game actions — all by understanding the UI as a human would. The project is open-source, supports deployment locally or remotely, and offers a foundation for building GUI automation agents that are more robust, and adaptable.

Features

Vision-language model-based GUI agent: perceives raw screenshots and reasons about UI context
Unified action space: supports clicks, typing, gestures, hotkeys across desktop, browser, mobile, and games
“Think-then-act” decision-making: performs internal reasoning (task decomposition, planning, reflection) before executing actions
Cross-platform GUI control: works across different operating systems, browsers, and application contexts
End-to-end automation: capable of carrying out full workflows (forms, downloads, navigation, game controls) without custom scripts per UI
Open-source with published inference scripts and models — enabling reproducibility and customization

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow UI-TARS

UI-TARS Web Site

Other Useful Business Software

Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.

Start Free

Rate This Project

User Reviews

Be the first to post a review of UI-TARS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

14 hours ago

Report inappropriate content

UI-TARS

UI-TARS-desktop version that can operate on your local personal device

Get an email when there's a new version of UI-TARS

Features

Project Samples

Project Activity

Categories

License

Follow UI-TARS

User Reviews

Additional Project Details

Operating Systems

Programming Language

Related Categories

Registered