Turn WiFi signals into real-time human sensing and spatial awareness.
human detection using yolov8
High-resolution models for human tasks
Driving with Graph Visual Question Answering
Multilingual Document Layout Parsing in a Single Vision-Language Model
CogView4, CogView3-Plus and CogView3(ECCV 2024)
AI tool for automating desktop tasks via natural language input
Free Motion Capture for Everyone
ElectronBot is a mini desktop robot
Chinese and English multimodal conversational language model
Open multimodal web agent built by Ai2
PyTorch code and models for the DINOv2 self-supervised learning
IPFS implementation in Go
UI-TARS-desktop version that can operate on your local personal device
Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Harmonized and Coherent Human Image Animation
JavaScript in-page GUI agent. Control web interfaces
Multimodal Agents as Smartphone Users, an LLM-based multimodal agent
Foundational Models for State-of-the-Art Speech and Text Translation
Python SDK for the Computer Use model Lux, developed by OpenAGI
Create browser automation as if you were teaching a human using GPT-4
autonomous system + observable + understandable + controllable + AI
CS2, Valorant, Fortnite, APEX, every game
Thinking notebook and Markdown editor
High-Resolution 3D Human Digitization from A Single Image