A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
Tools like web browser, computer access and code runner for LLMs
The open source post-building layer for agents
A high-performance ML model serving framework, offers dynamic batching
Leaderboard Comparing LLM Performance at Producing Hallucinations
8.5K high quality grade school math problems