Once again today's release was delayed partly by some broken links that any simple link-checker would have found. We should implement a link-checking system that can be run against Jenkins build products prior to beginning the release process. This could be a Jenkins job that is manually triggered at some point, or it could run automatically after each successful P5 build. The latter would alert us as soon as a link goes bad or a change to the processing screws up link creation.
I agree, but what tool do you suggest we use?
We'll have to do some research on this. We could actually write our own:
XSLT to pull out all clickable links and expand their URLs into a file, each with a pointer back to the file it came from;
Shell script to wget or curl the URLs and write errors to output.
i should point out that most links are checked as part of building the epub version of the Guidelines - epubcheck does this. However, the web guidelines have the tricksy extra footers and multi-lingual stuff, do we miss those.
Just to update the ticket: I have been testing the use of the Linux linkchecker package, and results are promising; I'm developing a Makefile and some config for a Jenkins job which will be manually run against output of the TEI-P5 job to check for broken links. I've also fixed about 70 existing broken links revealed during testing of linkchecker.
Now up and running on my Jinks server. More testing and deployment on the Oxford one to be done before closing the ticket.
Sebastian suggests that we might also implement a link-checker job to run periodically against the main tei-c server. Whether that should happen on external Jenkins machines, or perhaps on a Jinks running on tei-c.org (if that's eventually implemented) is up for discussion.
this is in place, so closing ticket