ARC-AGI is a benchmark dataset and experimental framework designed to evaluate and advance artificial general intelligence by testing systems on abstract reasoning tasks that require human-like problem-solving abilities. It consists of a curated set of tasks where models must infer patterns from input-output examples and apply those rules to new unseen cases, without relying on memorization or prior training data. The dataset is structured as grid-based puzzles, where each task requires understanding transformations such as symmetry, counting, or spatial manipulation. Unlike traditional machine learning benchmarks, ARC emphasizes generalization and reasoning over statistical pattern recognition, making it particularly challenging for current AI systems. The repository also includes a browser-based interface that allows humans to attempt solving the tasks manually, providing a baseline for comparison.
Features
- Dataset of abstract reasoning tasks for AI evaluation
- Grid-based problem format requiring pattern inference
- Separate training and evaluation task sets
- Browser interface for human problem solving
- Focus on generalization rather than memorization
- Benchmark for artificial general intelligence research