This is very frustrating to say the least. I'm trying
to run rundig to build my databases and keep getting a
forbidden by server robots.txt file. First of all
there is no robots.txt file on the server - htdig is
simply parsing the 404 error page which has no robots
info or meta robot tags at all.
I'm using version 3.2.0b5-7 and have put the following
into /etc/htdig/htdig.conf:
start_url: http://www.medicalresourceusa.com
limit_urls_to: ${start_url}
store.medicalresourceusa.com
It does ok on the www. page, but chokes the minute it
gets to store.medicalresourceusa.com.
Logged In: NO
sal ce facy deunde esty
nu vrey sa ne int.........
pa ]\ ????????????????
>?>?>?<><L><>,
.,.<><><.
>>>><<<<<<<<<<>>>>>>><>>>>>>>>>>>>>>>>>>>>>>>>
<>>>>>>>>>>>>>>>>>>>><><><><>><><>
<><><><>,.