...It operates through a loop of testing, analyzing failures, and refining the agent’s configuration to maximize a scoring metric. The framework uses a single-file agent harness combined with structured tasks and evaluation suites to guide optimization. It runs inside Docker for safe execution and reproducibility. This approach shifts agent development from manual design to automated optimization. The system is particularly useful for building domain-specific agents that need continuous performance improvement.