Agentex
Open source codebase for Scale Agentex
...It treats an “agent” as a composition of a policy (the LLM), tools, memory, and an execution runtime so you can test the whole loop, not just prompting. The repo focuses on structured experiments: standardized tasks, canonical tool interfaces, and logs that make it possible to compare models, prompts, and tool sets fairly. It also includes evaluation harnesses that capture success criteria and partial credit, plus traces you can inspect to understand where reasoning or tool use failed. The design encourages clean separation between experiment configuration and code, which makes sharing results or re-running baselines straightforward. ...