feat: add Compare Mode — blind A/B validation layer for Mneme profiles
Engineering guardrails for AI coding agents
Brought to you by:
mnemehq
Originally created by: TheoV823
comparison_results table — new schema + migration (migrate_add_comparison_results.sql), with cross-column CHECK constraint enforcing preferred_mode IS NULL on ties/skipsapp/models/comparison.py — insert_comparison, get_comparisons_for_user, compute_win_rate (win rate = mneme wins / decisive comparisons, ties/skips excluded from denominator)app/runner/compare.py — run_comparison: calls Claude API twice (default + Mneme system prompts), randomizes which output is Option A/B for blind comparisonflask compare — CLI command: runs both modes, displays blind A/B, captures a/b/tie/skip preference, persists resultflask compare-stats — CLI command: shows cumulative win rate for a userflask compare --user-id <id> --prompt "..."
│
├─ call_claude(default system prompt) ─┐
├─ call_claude(mneme system prompt) ─┤
│ │ randomized A/B
└─ display blind comparison ────────────┘
│
user picks A/B/tie/skip
│
insert into comparison_results
python -m pytest tests/ -q to verifyrandom.choice testsFor any database created before this feature:
sqlite3 mneme.db < migrate_add_comparison_results.sql
This branch is based on feature/signal-profile-refactor (PR [#1]), not main. It should be merged after PR [#1] lands.
🤖 Generated with Claude Code
Ticket changed by: TheoV823