| File | Date | Author | Commit |
|---|---|---|---|
| .claude-plugin | 2026-03-18 |
|
[77ac96] Update marketplace.json |
| assets | 2026-03-19 |
|
[b72716] Add files via upload |
| gouvernai | 2026-03-18 |
|
[b4e5c9] Update guardrails.md |
| LICENSE | 2026-03-17 |
|
[d9365c] Initial commit |
| README.md | 2026-03-19 |
|
[d07ac2] Update README.md |
| guardrails_log.md | 2026-03-18 |
|
[db9c45] Update guardrails_log.md |
Runtime guardrails for AI agents. Classifies every sensitive action by risk tier, enforces proportional controls, and logs a full audit trail. For teams using Claude Code with higher-risk workflows, CI pipelines, or approval requirements.
GouvernAI is an operational safety and governance layer — designed to catch mistakes, enforce consistent approval workflows, and create accountability through audit logging. It is one layer in a defense-in-depth approach to AI agent risk management, as recommended by the 2026 International AI Safety Report: multiple layers of safeguards compensating for weaknesses in any single control.
Dual enforcement: Linguistic skill (probabilistic risk classification by Claude) + deterministic hooks (pattern-based blocking via PreToolUse scripts). The hooks block common obfuscated commands, credential transmission patterns, and catastrophic system commands — even if Claude skips the skill. Some bypass patterns exist (see Threat Model). >
Note: The hook layer covers common patterns, not all possible bypass techniques. See the Threat Model section for documented gaps.
# Add the marketplace first
claude plugin marketplace add Myr-Aya/GouvernAI-claude-code-plugin
# Then install the plugin
claude plugin install gouvernai@mindxo
After install, guardrails activate automatically on the next session. No configuration required.
Claude Code Terminal: Guardrails activate automatically. No action needed.
Claude Code Desktop: Run /gouvernai:guardrails at the start of your session to activate the gate. The skill may not auto-trigger reliably in these environments.
Try these after installing to see the guardrails in action:
git status — Tier 1, excluded from gate, no overheadecho aGVsbG8= | base64 -d | bash — hook blocks with exit code 2/guardrails — Session status
File write in the workspace. GouvernAI notifies and proceeds unless you object.

Package installation requires explicit approval before executing.

Bulk file deletion: base tier T3 escalated to T4 for 9 targets. Lists every file and asks for confirmation.

Outbound email to unfamiliar recipient: base tier T3 escalated to T4. Shows the escalation chain.

Base64-to-bash pipe detected and blocked. No override possible.

Attempt to edit SKILL.md to remove the gate. Blocked with explanation and alternatives.

API key detected in file write. Shows the key, explains the risk, suggests alternatives.

In relaxed mode, T2 actions proceed with no gate. T3 and T4 still require approval.


Full session audit trail showing every gated action with tier, outcome, and escalation reason.

| Command | What it does |
|---|---|
/guardrails |
Show current mode, tier distribution, approvals/denials |
/guardrails log |
Display recent audit log entries |
/guardrails strict |
All tiers +1 — persisted to guardrails-mode.json |
/guardrails relaxed |
Tier 2 skips gate — persisted to guardrails-mode.json |
/guardrails audit |
Audit-only mode: T2/T3 auto-proceed, T4 halts (for CI/unattended) |
/guardrails reset |
Return to default full-gate mode |
/guardrails policy |
Display hard constraints |
Mode changes are written to guardrails-mode.json in the project root and persist across sessions and context resets. Previously, mode was held only in the model's context window and was silently lost on reset.
In full-gate mode, Tier 2 actions use "proceed unless objected" — which is silent auto-approval when no human is watching. For scheduled tasks, CI pipelines, or any unattended run, set audit-only mode first:
/guardrails audit
In audit-only mode: T2 and T3 auto-proceed with full logging, T4 halts without executing. Hard constraints still block regardless of mode.
The SKILL.md file teaches Claude the 8-step gate process: identify, determine mode, classify (using ACTIONS.md), escalate (using TIERS.md), check pre-approval, check hard constraints (using POLICY.md), apply controls, log and execute. Claude reads and follows these instructions with judgment.
The PreToolUse hook (scripts/guardrails-enforce.py) runs on every Bash, Write, and Edit tool call. It checks for:
If a violation is detected, the hook exits with code 2 (hard block). Claude cannot override this.
Skills are probabilistic — Claude uses judgment about when to apply them. On complex tasks, it might skip classification. Hooks are deterministic — they run every time, no exceptions. The skill handles the nuanced risk classification (is this a Tier 2 or Tier 3?). The hooks enforce the non-negotiable rules (never transmit credentials, never run obfuscated commands).
gouvernai/
├── .claude-plugin/
│ └── plugin.json # Plugin metadata
├── skills/
│ └── gouvernai/
│ ├── SKILL.md # Gate orchestrator (always loaded)
│ ├── ACTIONS.md # Action → tier classification lookup
│ ├── TIERS.md # Universal controls + escalation rules
│ ├── POLICY.md # Hard constraints (NEVER rules)
│ └── GUIDE.md # Output format templates
├── commands/
│ └── guardrails.md # /guardrails slash command
├── hooks/
│ └── hooks.json # PreToolUse hook configuration
├── scripts/
│ └── guardrails-enforce.py # Deterministic enforcement script
├── tests/
│ └── test_guardrails_enforce.py # Hook unit tests
└── README.md # This file
Runtime files written to the project root during use:
guardrails_log.md — append-only audit logguardrails-mode.json — persisted mode config (created on first /guardrails mode command)| Variable | Set by | Purpose |
|---|---|---|
CLAUDE_PLUGIN_ROOT |
Claude Code | Absolute path to the installed plugin directory. Used by hooks.json to locate guardrails-enforce.py. |
CLAUDE_PROJECT_DIR |
Claude Code | Absolute path to the current project. Used by the hook and skill to locate guardrails_log.md and guardrails-mode.json. |
If CLAUDE_PLUGIN_ROOT is not set (e.g. when running the hook script manually or in tests), the script falls back to its own parent directory (scripts/../ = plugin root). No action required — the fallback is automatic.
Important: This plugin installs hooks that run on every tool call. Review the source code before installing. The enforcement script (scripts/guardrails-enforce.py) is transparent and auditable.
In February 2026, Check Point Research disclosed CVEs allowing RCE through Claude Code hooks in untrusted repos. This plugin should be installed at user scope (default), not project scope, unless you trust all contributors to the project.
# Add the marketplace first
claude plugin marketplace add Myr-Aya/GouvernAI-claude-code-plugin
# User scope (default, recommended)
claude plugin install gouvernai@mindxo
# Project scope (only if you trust all contributors)
claude plugin install gouvernai@mindxo --scope project
Get-Content, Invoke-WebRequest, Remove-Item) are not covered. Claude Code uses Bash on all platforms, so this is low risk for typical usage.GouvernAI is an operational safety and governance layer, not a security boundary. In the defense-in-depth model described by the 2026 International AI Safety Report, it sits at the runtime layer — gating agent actions before execution through a combination of linguistic classification and pattern-based blocking.
Its real value is:
It does not protect against sophisticated, determined adversaries. Regex-plus-prompt guardrails are effective at stopping mistakes, not targeted attacks.
What it catches:
What it does NOT catch:
Defense in depth: GouvernAI is one layer in a multi-layer safety stack. For production or high-security environments, complement it with:
No single layer is sufficient. The 2026 International AI Safety Report's Swiss cheese model applies: each layer has holes, but layered together they provide meaningful protection.
MIT — see LICENSE