From: Mattison W. <mat...@ne...> - 2011-11-15 14:52:48
|
Ok. The robots.txt is in place now for dev and staging. I'll keep an eye on it to see if that resolves the problem. If not, I'll correlate the logs to see what IP address the problem is coming from. -Mattison On Tue, Nov 15, 2011 at 9:31 AM, William Piel <wil...@ya...> wrote: > > On Nov 15, 2011, at 8:50 AM, Mattison Ward wrote: > > The Nagios monitoring system queries treebase-dev every few minutes to > make sure it is up using this query: > > http://treebase-dev.nescent.org/treebase-web/search/studySearch.html?query=prism.publicationName=Nature&format=null&recordSchema=null > > > It might be unrelated, but I saw a fair amount of activity from search > engines in the web server logs. > > I can set up a robots.txt file to keep search engines from crawling > the dev and staging sites. > > Would it make sense to keep search engines from crawling any sections > of the production site? > > Hi Mattison: > Search engines are already blocked from crawling production: > http://www.treebase.org/robots.txt > ... though I don't find this on stage or dev: > http://treebase-stage.nescent.org/robots.txt > http://treebase-dev.nescent.org/robots.txt > So definitely, please have a robots block on stage and dev -- in fact, it > should block *everything* on those two sites because we don't want them to > compete with production. > So perhaps this has nothing to do with Carl's R bindings. That would be > great news if true. > Do you have logs that provide the IP identity of users responsible for > taking down dev? > bp > > > > > -- Mattison Ward NESCent at Duke University 2024 W. Main Street, Suite A200 Durham, NC 27705-4667 919-668-4585 (desk) 919-668-4551 (alternate) 919-668-9198 (fax) |