Codex Autoresearch is an autonomous software improvement framework that enables AI coding agents to iteratively enhance codebases without continuous human input. The system operates in a loop where the agent modifies code, evaluates results against measurable metrics, and either keeps or discards changes based on performance. It generalizes the concept of autoresearch beyond machine learning, allowing optimization of test coverage, latency, lint errors, and overall code quality. Developers define a goal and verification command, and the agent continuously runs experiments to reach the desired outcome. The framework supports multiple operational modes, including debugging, planning, security auditing, and release validation. It can run unattended for extended periods, producing logs of experiments and improvements. This approach transforms software development into an iterative, evidence-driven optimization process rather than manual trial and error.
Features
- Autonomous improve verify retain discard loop
- Metric-driven optimization of codebases
- Support for debugging planning and security audit modes
- Unattended execution for continuous improvement
- Integration with tests linters and benchmarks
- Experiment logging and reproducible iteration history