Re: [Phpgedview-talk] Google
Brought to you by:
canajun2eh,
yalnifj
From: Daniel P. K. <da...@ki...> - 2005-08-28 16:49:52
|
I have been indexed by Google for a long time, but in August my usage jumped to 4X normal. It is good to be indexed--a 3rd cousin just found my site, and now she is helping out on that branch of the tree. I looked at my usage by URL, and I was surprised to see that 64% of the hits are on phpGedView/calendar.php. I am going to try the Disallow just on that URL. It makes since it would get stuck on the calendar URL. There is no end of dates to query for. Matthew Gates wrote: > On Saturday 27 August 2005 18:29, Joe Tellup wrote: > >>Me too. this past week google used up 2.3 gig of my bandwidth. >> >>Put this in you robots text file and they are gone >> >>User-agent: googlebot >>Disallow: / >> > > > I found that using robots.txt to partially block bots it quite effective > at reducing traffic while still permitting your site to be indexed > effectively. > > The trick is to find which files on the site are causing most traffic and > block just those. I also blocked all the charts. My idea is that I want > there to be at least one entry point to my site for each name and/or > place in the database. > > Here's the robots.txt (my phpGedView installation is in the directory > "gedview"): > > User-agent: * > Disallow: /gedview/media/ > Disallow: /gedview/timeline.php > Disallow: /gedview/fanchart.php > Disallow: /gedview/pedigree.php > Disallow: /gedview/clippings.php > Disallow: /gedview/family.php > Disallow: /gedview/ancestry.php > Disallow: /gedview/descendancy.php > Disallow: /gedview/reportengine.php > Disallow: /gedview/hourglass.php > Disallow: /gedview/calendar.php > Disallow: /gedview/patriarchlist.php > > Further tuning may be helped by this command which shows where the > crawlers are makking most traffic. "access_log" is name of the apache > log file - run this command in the web log directory. If you don't have > an ssh login to your web host, copy the web log to youor local linux > machine. If you don't have a linux machine, install cygwin! > > grep Googlebot access_log \ > |grep gedview > |awk -F\" '{ print $2 }' \ > |awk '-F[/?]' '{ print $3 }' \ > |sort | uniq -c > > The "grep gedview" will need to be changed for your site to filter only > the pages for phpGedView (if youo have other parts to your website). > > The $3 on the fourth line of the command might need to be changed if your > phpGedView install isn't in a sub-directory of the root of your web > server. Mine (/gedview) is a single subdirectory. If yours is in a > second level directory (e.g. /stuff/gedview) you need to change the $3 to > $4. > > The output looks something like this: > 1188 indilist.php > 1109 placelist.php > 527 individual.php > 337 aliveinyear.php > 73 repo.php > 16 repolist.php > 5 famlist.php > 4 HTTP > 1 relationship.php HTTP > > > >>Google is now making mirror images of all websites, I guess they are >>going for a stock split. > > > How can you tell? > > >>-----Original Message----- >>From: php...@li... >>[mailto:php...@li...]On Behalf Of Ken >>Lowther >>Sent: Saturday, August 27, 2005 1:52 PM >>To: php...@li... >>Subject: [Phpgedview-talk] Google >> >> >>Googlebot has been pounding my site. >> >>http://genealogy.lowther.org/cgi-bin/awstats.pl >> >>330 individuals on file: >> >>http://genealogy.lowther.org/ >> >>Anyone else had this problem? The only thing I can think is that is >>chasing links around in circles. Googlebot was connecte 24/7 using a >>MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was >>experiencing times when the machine reponded to very litte so that is >>what started me digging. >> >>Ken >> |