SIA is a self-improving AI framework designed to improve the performance of models or agents on benchmark tasks. It uses an iterative loop where a meta-agent creates or updates a task-specific target agent, while a feedback agent studies results and proposes improvements. The framework can refine both the harness around the task and the agent implementation itself. It is aimed at research and experimentation across tasks such as machine learning benchmarks, legal classification, code optimization, and scientific workflows. It includes built-in tasks, a command-line runner, and a visual dashboard for following generations as they evolve. It also lets users define custom providers, profiles, seed agents, and task directories without changing the core code.
Features
- Self-improvement loop using meta, target, and feedback agents
- Built-in benchmark tasks such as GPQA, LawBench, LongCoT Chess, and Spaceship Titanic
- CLI commands for running improvement cycles and serving the visualizer
- Per-generation artifacts including target agent code, logs, and improvement notes
- Live web dashboard with scores, prompts, execution trajectories, and code views
- Custom provider, profile, seed agent, and task directory support