From: Patrick Robinson <pgr@vt...> - 2003-06-28 17:09:08
I sent this to htdig-dev, but I'm not entirely sure if it's a -dev or a
-general issue. Sorry for the crosspost, but I thought there might be
some people on this list who had seen this.
I just installed htdig-3.2.0b4-20030622, and discovered that it's not
correctly handling Disallow: patterns from my robots.txt file. (I'm
hoping this is the correct list to post this!)
I have these lines in my robots.txt:
In my config file, I do NOT exclude /cgi-bin/ via exclude_urls.
However, when I rundig -vvv, it tells me that URLs like the following
are rejected due to being "forbidden by server robots.txt":
This shouldn't happen. It should only be rejecting URLs *starting*
with "/WebObjects/" (at least, that's my interpretation of what I read
If I remove the "Disallow: /WebObjects/" line from robots.txt and rerun
rundig, it now indexes those URLs.
I never had this problem in 3.1.6. Has something changed?
AHNR Info Technology, Virginia Tech