Menu

#4 Add web UI for compare flow

closed
nobody
None
2026-05-08
2026-04-07
Anonymous
No

Originally created by: TheoV823

Summary

  • New Flask blueprint (compare) with routes for running comparisons, voting on A/B outputs, viewing win-rate stats, and adding users via a web form
  • Four Jinja2 templates under app/templates/compare/ with a clean, minimal design
  • Compare-specific CSS appended to style.css (no existing styles touched)
  • Uses Flask sessions to hold LLM outputs between /compare/run and /compare/vote — no schema changes needed

How to test

  1. flask run (with ANTHROPIC_API_KEY set)
  2. Browse to http://localhost:5000/compare
  3. Click Add User → fill in name + tweak the sample profile JSON → Create
  4. Select the user, type a prompt, click Run Comparison
  5. Pick A / B / Tie / Skip on the result page
  6. Check View Stats to see the win-rate update

Test plan

  • [ ] pytest passes (107 tests, no regressions)
  • [ ] /compare renders with user dropdown and example prompts
  • [ ] /compare/add-user creates a user and redirects to index with that user pre-selected
  • [ ] /compare/run calls LLM, stores in session, redirects to result page
  • [ ] /compare/result shows prompt + two outputs side by side
  • [ ] Each vote button saves correctly and flashes a confirmation
  • [ ] /compare/stats shows win rate with correct colour coding
  • [ ] Flashes appear for missing user, empty prompt, invalid JSON, expired session

🤖 Generated with Claude Code

Related

Tickets: #113
Tickets: #178
Tickets: #186

Discussion

  • Anonymous

    Anonymous - 2026-05-08

    Originally posted by: TheoV823

    Closing without merge after triage on 2026-05-08.

    Why: This PR adds a web UI for the compare flow (app/web/compare_views.py plus four compare/*.html templates). The underlying CLI feature (compare, compare-stats) is on main, so the dependencies exist — but the public-facing positioning has moved from "personalization / compare default vs Mneme-tuned AI" to "architectural governance for AI-assisted development."

    Shipping a public web UI for blind comparison of AI outputs would actively conflict with the locked positioning ("Rules files document standards. Mneme enforces them."). It would suggest Mneme is a personalization product, which is the prior frame that has been deliberately retired.

    The implementation itself is fine — 636 lines of working Flask views + templates — it just targets a product surface this repo no longer has.

    If the intent was internal benchmarking UX: the four templates and the compare_views.py module belong in the private internal-tooling repo per ADR-002, not on the public Mneme repo's app/.

    If the intent was a public-facing eval tool: hold for a positioning decision that accommodates it. The current locked taglines do not.

    Branch feature/compare-ui will be deleted as part of this triage. The work is recoverable from this PR's diff if it's needed later.

    Triage rationale captured in docs/plans/2026-05-08-feature-branch-triage.md (local-only per repo gitignore policy).

     
  • Anonymous

    Anonymous - 2026-05-08

    Ticket changed by: TheoV823

    • status: open --> closed
     

Log in to post a comment.

Auth0 Logo