Re: [Phpgedview-talk] Google
Brought to you by:
canajun2eh,
yalnifj
From: John <sh...@jb...> - 2005-08-28 15:00:54
|
On 28 Aug 2005 at 13:45, Matthew Gates wrote: > On Saturday 27 August 2005 18:29, Joe Tellup wrote: > > Me too. this past week google used up 2.3 gig of my bandwidth. > > > > Put this in you robots text file and they are gone > > > > User-agent: googlebot > > Disallow: / > > Actually, I went one step further and locked-out any access to my web root. I do not allow any files, reads or writes either. Everything is placed into domain subdirectories. The plus side was that it cut-down most 'hack' attempts. I really don't care to be index by the search engines, so this works for me - I'm not commercial. I'll guess that placing a robot.txt file in my root will allow those search bots that play nice proper access, and the stripper bots/hot linkers nothing. > I found that using robots.txt to partially block bots it quite > effective at reducing traffic while still permitting your site to be > indexed effectively. > > The trick is to find which files on the site are causing most traffic > and block just those. I also blocked all the charts. My idea is that > I want there to be at least one entry point to my site for each name > and/or place in the database. > > Here's the robots.txt (my phpGedView installation is in the directory > "gedview"): > > User-agent: * > Disallow: /gedview/media/ > Disallow: /gedview/timeline.php > Disallow: /gedview/fanchart.php > Disallow: /gedview/pedigree.php > Disallow: /gedview/clippings.php > Disallow: /gedview/family.php > Disallow: /gedview/ancestry.php > Disallow: /gedview/descendancy.php > Disallow: /gedview/reportengine.php > Disallow: /gedview/hourglass.php > Disallow: /gedview/calendar.php > Disallow: /gedview/patriarchlist.php > > Further tuning may be helped by this command which shows where the > crawlers are makking most traffic. "access_log" is name of the apache > log file - run this command in the web log directory. If you don't > have an ssh login to your web host, copy the web log to youor local > linux machine. If you don't have a linux machine, install cygwin! > > grep Googlebot access_log \ > |grep gedview > |awk -F\" '{ print $2 }' \ > |awk '-F[/?]' '{ print $3 }' \ > |sort | uniq -c > > The "grep gedview" will need to be changed for your site to filter > only the pages for phpGedView (if youo have other parts to your > website). > > The $3 on the fourth line of the command might need to be changed if > your phpGedView install isn't in a sub-directory of the root of your > web server. Mine (/gedview) is a single subdirectory. If yours is in > a second level directory (e.g. /stuff/gedview) you need to change the > $3 to $4. > > The output looks something like this: > 1188 indilist.php > 1109 placelist.php > 527 individual.php > 337 aliveinyear.php > 73 repo.php > 16 repolist.php > 5 famlist.php > 4 HTTP > 1 relationship.php HTTP > > > > Google is now making mirror images of all websites, I guess they are > > going for a stock split. > > How can you tell? > Actually, I think it maybe more of an index size war. Notices how Yahoo and Google have really been publishing their index sizes here of late. And with Microsoft now joining the fray, I think its only going to get worse. > > -----Original Message----- > > From: php...@li... > > [mailto:php...@li...]On Behalf Of Ken > > Lowther Sent: Saturday, August 27, 2005 1:52 PM To: > > php...@li... Subject: [Phpgedview-talk] > > Google > > > > > > Googlebot has been pounding my site. > > > > http://genealogy.lowther.org/cgi-bin/awstats.pl > > > > 330 individuals on file: > > > > http://genealogy.lowther.org/ > > > > Anyone else had this problem? The only thing I can think is that is > > chasing links around in circles. Googlebot was connecte 24/7 using > > a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was > > experiencing times when the machine reponded to very litte so that > > is what started me digging. > > > > Ken > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > > Practices Agile & Plan-Driven Development * Managing Projects & > > Teams * Testing & QA Security * Process Improvement & Measurement * > > http://www.sqe.com/bsce5sf > > _______________________________________________ Phpgedview-talk > > mailing list Php...@li... > > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > > Practices Agile & Plan-Driven Development * Managing Projects & > > Teams * Testing & QA Security * Process Improvement & Measurement * > > http://www.sqe.com/bsce5sf > > _______________________________________________ Phpgedview-talk > > mailing list Php...@li... > > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices Agile & Plan-Driven Development * Managing Projects & Teams > * Testing & QA Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ Phpgedview-talk > mailing list Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |