According to Andrew Daviel:
> On Sun, 30 Sep 2001, Geoff Hutchison wrote:
> re. logging broken links
> > I don't really see this as a needed feature to ht://Dig itself since
> > you can do this as a script running on top of the htdig output using
> > the -s flag. See
> > <http://www.htdig.org/files/contrib/scripts/showdead.pl>
> > <http://www.htdig.org/files/contrib/scripts/report_missing_pages.pl>
> I played with this briefly.
> Htdig listed a broken link (404) on the same server, and a 404 link to
> another server.
> It didn't list a 502 (connection refused) or 500 (unknown host).
> It also doesn't grab any mail information or owner metadata, so
> you'd either have to have a database of sites/URL fragments against
> authors, or re-spider the list of referring documents to gather author
> information, if you wanted to mail authors or sort the broken list
> by author.
> My robot produces e.g. http://www.triumf.ca/trsearch/errors2.html
> (but I'm too lazy to fix my links - advax@...)
Neither htdig 3.1.x nor 3.2.x deal with 500 and 502 error codes right now.
Are you indexing through a proxy server? Normally, htdig will detect these
two error conditions itself, and set the appropriate internal error status
codes to generate the error messages for missing pages, but I don't know
what happens when a proxy server detects these error conditions and reports
them back to htdig.
Gilles R. Detillieux E-mail: <grdetil@...>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
Get latest updates about Open Source Projects, Conferences and News.