|
From: Patrick R. <pg...@vt...> - 2003-06-27 20:00:20
|
Hi folks, I just installed htdig-3.2.0b4-20030622, and discovered that it's not correctly handling Disallow: patterns from my robots.txt file. (I'm hoping this is the correct list to post this!) I have these lines in my robots.txt: User-agent: * Disallow: /WebObjects/ In my config file, I do NOT exclude /cgi-bin/ via exclude_urls. However, when I rundig -vvv, it tells me that URLs like the following are rejected due to being "forbidden by server robots.txt": href: http://www.mysite.edu/cgi-bin/WebObjects/blah/blah/blah This shouldn't happen. It should only be rejecting URLs *starting* with "/WebObjects/" (at least, that's my interpretation of what I read at http://www.robotstxt.org/wc/norobots.html). If I remove the "Disallow: /WebObjects/" line from robots.txt and rerun rundig, it now indexes those URLs. I never had this problem in 3.1.6. Has something changed? -- Patrick Robinson AHNR Info Technology, Virginia Tech pg...@vt... |