Petri is an open-source alignment auditing agent that lets researchers rapidly test concrete safety hypotheses against target models using realistic, multi-turn scenarios. Instead of building bespoke evals, Petri automatically generates audit environments from seed “special instructions,” orchestrates an auditor model to probe a target model, and simulates tool use and rollbacks to surface risky behaviors. Each interaction transcript is then scored by a judge model using a consistent rubric so results are comparable across runs and models. The system supports major model APIs and comes with starter seeds and judge dimensions, enabling minutes-to-insight workflows for questions like reward hacking, self-preservation, or eval awareness. Petri is designed for parallel exploration: it spins many audits in flight, aggregates findings, and highlights transcripts that deserve human review.
Features
- Scenario generator that turns seed instructions into realistic audit setups
- Multi-turn auditor orchestration with simulated tool use and rollbacks
- Judge model that scores transcripts via a consistent safety rubric
- Parallel execution to explore many hypotheses and surface the riskiest traces first
- Built-in starters for seeds and judge dimensions plus guidance for customization
- API support for popular model providers with reproducible runs and reports