A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
The platform for LLM evaluations and AI agent testing
The open source post-building layer for agents
Open-source LLM load balancer and serving platform for hosting LLMs
Leaderboard Comparing LLM Performance at Producing Hallucinations
8.5K high quality grade school math problems