Benchmark LLMs by fighting in Street Fighter 3
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Agentic, Reasoning, and Coding (ARC) foundation models
An orchestration framework for agentic AI and LLM applications
Driving with Graph Visual Question Answering
The Cradle framework is a first attempt at General Computer Control
Power CLI and Workflow manager for LLMs (core package)
Chat language model that can use tools and interpret the results
Synergizing Reasoning and Acting in Language Models
Implementation of "Tree of Thoughts