Inspect Petri
An alignment auditing agent capable of exploring alignment hypothesis
Inspect Petri is an open-source alignment auditing agent that lets researchers rapidly test concrete safety hypotheses against target models using realistic, multi-turn scenarios. Instead of building bespoke evals, Inspect Petri automatically generates audit environments from seed “special instructions,” orchestrates an auditor model to probe a target model, and simulates tool use and rollbacks to surface risky behaviors. Each interaction transcript is then scored by a judge model using a...