From: Jon A. <jon...@ne...> - 2010-03-22 02:02:56
|
I think they are respecting robots.txt now. The database volume has not grown for hours and the apache logs no longer show google searching the database. -Jon On Mar 21, 2010, at 9:05 PM, William Piel wrote: > > On Mar 21, 2010, at 5:30 PM, Jon Auman wrote: > >> Yes, the file system is running out of space. And google wasn't honoring the robots.txt. Since the last reboot, it seems that no spiders are searching the site, so let's see how that goes. >> >> It wasn't just google, but msn, snap.com, baidu.com, ro...@ga... (WTF?), etc. It was like hey boys, I found a 100GB database, let's index it! >> >> Anywho, I'll be keeping an eye on it. No temp files for the last 1/2 hour. >> >> -Jon > > Wow. Thanks Jon. Indeed, in TreebASE1, I found plenty of robots did not respect robots.txt -- when they hit, it would slow me down and I'd listen to a lot of disk activity noise all day... > > But I can't help but think -- if we can't keep TreeBASE up for more than a few hours at a time, might it make more sense to pause on the press release until have a lasting solution? I'm concerned that when we make an announcement and all sorts of people head to the site to check it out, I don't want their first experience to be one of an exception barf. > > bp > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev_______________________________________________ > Treebase-devel mailing list > Tre...@li... > https://lists.sourceforge.net/lists/listinfo/treebase-devel ------------------------------------------------------- Jon Auman Systems Administrator National Evolutionary Synthesis Center Duke University http:www.nescent.org jon...@ne... ------------------------------------------------------ |