From: William P. <wil...@ya...> - 2011-11-15 14:31:49
|
On Nov 15, 2011, at 8:50 AM, Mattison Ward wrote: > The Nagios monitoring system queries treebase-dev every few minutes to > make sure it is up using this query: > > http://treebase-dev.nescent.org/treebase-web/search/studySearch.html?query=prism.publicationName=Nature&format=null&recordSchema=null > > > It might be unrelated, but I saw a fair amount of activity from search > engines in the web server logs. > > I can set up a robots.txt file to keep search engines from crawling > the dev and staging sites. > > Would it make sense to keep search engines from crawling any sections > of the production site? Hi Mattison: Search engines are already blocked from crawling production: http://www.treebase.org/robots.txt ... though I don't find this on stage or dev: http://treebase-stage.nescent.org/robots.txt http://treebase-dev.nescent.org/robots.txt So definitely, please have a robots block on stage and dev -- in fact, it should block *everything* on those two sites because we don't want them to compete with production. So perhaps this has nothing to do with Carl's R bindings. That would be great news if true. Do you have logs that provide the IP identity of users responsible for taking down dev? bp |