OCR model for complex documents with layout-aware structured outputs
The SOTA Open-Source Browser Agent
Automate browser-based workflows with LLMs and Computer Vision
UI-TARS-desktop version that can operate on your local personal device
Python tool for browser-based interactive data apps in one file
Distill your ex into an AI Skill
Advanced LLM-powered brute-force tool combining AI intelligence
An open phone agent model & framework
Control Gmail, Google Calendar, Docs, Sheets, Slides, Chat, Forms
The open source post-building layer for agents
An on-premises, OCR-free unstructured data extraction
Stanford NLP Python library for many human languages
Structured data extraction and instruction calling with ML, LLM
Physical Symbolic Optimization
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System
Qwen3-omni is a natively end-to-end, omni-modal LLM
Framework for building AI agents that automate complex web tasks
Python library for model interpretation/explanations
Tencent’s 36-language state-of-the-art translation model