From: Gilles D. <gr...@sc...> - 2001-10-17 19:45:11
|
According to Andrew Daviel: > On Sun, 30 Sep 2001, Geoff Hutchison wrote: > re. logging broken links > > > > I don't really see this as a needed feature to ht://Dig itself since > > you can do this as a script running on top of the htdig output using > > the -s flag. See > > <http://www.htdig.org/files/contrib/scripts/showdead.pl> > > <http://www.htdig.org/files/contrib/scripts/report_missing_pages.pl> > > I played with this briefly. > Htdig listed a broken link (404) on the same server, and a 404 link to > another server. > > It didn't list a 502 (connection refused) or 500 (unknown host). > It also doesn't grab any mail information or owner metadata, so > you'd either have to have a database of sites/URL fragments against > authors, or re-spider the list of referring documents to gather author > information, if you wanted to mail authors or sort the broken list > by author. > My robot produces e.g. http://www.triumf.ca/trsearch/errors2.html > (but I'm too lazy to fix my links - ad...@tr...) Neither htdig 3.1.x nor 3.2.x deal with 500 and 502 error codes right now. Are you indexing through a proxy server? Normally, htdig will detect these two error conditions itself, and set the appropriate internal error status codes to generate the error messages for missing pages, but I don't know what happens when a proxy server detects these error conditions and reports them back to htdig. -- Gilles R. Detillieux E-mail: <gr...@sc...> Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil Dept. Physiology, U. of Manitoba Phone: (204)789-3766 Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930 |