All of a sudden we started to get loads of errors related to browse indexes.
It looks like some robot is using http://qmro.qmul.ac.uk/jspui/browse url while it should be using this url with some parameter.
An internal server error occurred on http://qmro.qmul.ac.uk/jspui:
Date: 4/24/13 2:38 PM
Session ID: 76F88FEE407DA917F0BAAE58640E8063
IP address: 184.108.40.206
-- URL Was: http://qmro.qmul.ac.uk/jspui/browse
-- Method: GET
-- Parameters were:
javax.servlet.ServletException: There is no browse index for the request
Is there any way to stop robot to browse the website?
We got 2 robots.txt
One is under /tomcat/webapps/jspui and has following content:
# Uncomment the following line ONLY if sitemaps.org or HTML sitemaps are used
# and you have verified that your site is being indexed correctly.
# Disallow: /browse
Another robot.txt is at the root of apache /var/www/html and has following content:
Does adding ‘Disallow: /browse to’ any of these files will stop robot to browse our repository?
On Wed, Apr 24, 2013 at 3:53 PM, Kirti Bodhmage <k.bodhmage@...> wrote:
> It looks like some robot is using http://qmro.qmul.ac.uk/jspui/browse url while it should be using this url with some parameter.
that must be some kind of error in your repository, because it
shouldn't happen. If you call browse without any parameter, it should
display "Browsing by Title", as you can see here (I also verified it
on a 1.7.2 repository):
> Is there any way to stop robot to browse the website?
Definitely don't add "Disallow: /browse" to robots.txt. It would
practically prevent crawlers from indexing your whole site. A better
way would be to try to fix the /browse URL somehow. Either find out
why it's broken and fix it in DSpace or, if n case you're using Apache
HTTPD in front of httpd, you can simply redirect /jspui/browse to
/jspui/browse?type=title, thus avoiding the problem.
Compulsory reading: DSpace Mailing List Etiquette