UI-TARS is an open-source multimodal “GUI agent” created by ByteDance: a model designed to perceive raw screenshots (or rendered UI frames), reason about what needs to be done, and then perform real interactions with graphical user interfaces (GUIs) — like clicking, typing, navigating menus — across desktop, browser, mobile, or game environments. Rather than relying on rigid, manually scripted UI automation, UI-TARS uses a unified vision-language model (VLM) that integrates perception, reasoning, grounding, and action into one end-to-end framework: it “thinks before acting,” enabling flexible, general-purpose automation. This allows it to perform complex, multi-step tasks such as filling forms, downloading files, navigating applications, and even controlling in-game actions — all by understanding the UI as a human would. The project is open-source, supports deployment locally or remotely, and offers a foundation for building GUI automation agents that are more robust, and adaptable.

Features

  • Vision-language model-based GUI agent: perceives raw screenshots and reasons about UI context
  • Unified action space: supports clicks, typing, gestures, hotkeys across desktop, browser, mobile, and games
  • “Think-then-act” decision-making: performs internal reasoning (task decomposition, planning, reflection) before executing actions
  • Cross-platform GUI control: works across different operating systems, browsers, and application contexts
  • End-to-end automation: capable of carrying out full workflows (forms, downloads, navigation, game controls) without custom scripts per UI
  • Open-source with published inference scripts and models — enabling reproducibility and customization

Project Samples

Project Activity

See All Activity >

License

Apache License V2.0

Follow UI-TARS

UI-TARS Web Site

Other Useful Business Software
Gen AI apps are built with MongoDB Atlas Icon
Gen AI apps are built with MongoDB Atlas

Build gen AI apps with an all-in-one modern database: MongoDB Atlas

MongoDB Atlas provides built-in vector search and a flexible document model so developers can build, scale, and run gen AI apps without stitching together multiple databases. From LLM integration to semantic search, Atlas simplifies your AI architecture—and it’s free to get started.
Start Free
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of UI-TARS!

Additional Project Details

Operating Systems

Linux, Mac, Windows

Programming Language

Python

Related Categories

Python Artificial Intelligence Software

Registered

14 hours ago