Menu

#105 ci(publish): validate llms.txt coverage for new public pages

open
nobody
automation (2)
2026-05-17
2026-05-17
Anonymous
No

Originally created by: TheoV823

Problem

Three pages added or touched in the wave 2 launch batch are missing from llms.txt:

  • /insights/ai-native-engineering-intent-debt/ (new article, wave 2)
  • /demo/governed-python-agent/
  • /insights/harness-engineering-still-needs-governance/

llms.txt is a GEO/LLM citation surface. Missing entries mean these pages are invisible to LLM crawlers that respect the file.

Finding

Discovered during Task 2b of the post-launch cleanup runbook (2026-05-17) via scripts/seo_check.py — rule geo.llms fired as WARN on all three pages.

Root cause

The publishing script (scripts/create_mneme_article.py or equivalent) does not validate llms.txt coverage after adding a new page. There is no automated check in the deploy path for this gap.

Proposed fix

Add a rule_llms_coverage check to scripts/seo_check.py (or a dedicated step in the publishing script) that:

  1. Reads the set of public-facing URLs from site/sitemap.xml (already updated per deploy).
  2. Reads the set of URLs declared in llms.txt.
  3. Fails (WARN or FAIL depending on mode) for any sitemap URL absent from llms.txt.

This makes llms.txt lag a first-class deploy signal rather than a manual catch.

Scope

  • Modify scripts/seo_check.py or scripts/create_mneme_article.py
  • No site content changes
  • No llms.txt edits in this issue (fix the automation first, then run it)

Discovered during post-launch cleanup runbook (2026-05-17).

Discussion


Log in to post a comment.

Auth0 Logo