Automate browser-based workflows with LLMs and Computer Vision
UI-TARS-desktop version that can operate on your local personal device
An open phone agent model & framework
A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming
Physical Symbolic Optimization
Qwen3-omni is a natively end-to-end, omni-modal LLM
Stanford NLP Python library for many human languages
Python library for model interpretation/explanations
Tencent’s 36-language state-of-the-art translation model