Re: [Phpgedview-talk] Google
Brought to you by:
canajun2eh,
yalnifj
From: Ken L. <klo...@ci...> - 2005-08-28 16:59:28
|
OK. Makes sense. My error log was getting a lot of errors for the calender. Ken Daniel P. Kionka wrote: > I have been indexed by Google for a long time, but in August my usage > jumped to 4X normal. It is good to be indexed--a 3rd cousin just > found my site, and now she is helping out on that branch of the tree. > > I looked at my usage by URL, and I was surprised to see that 64% of > the hits are on phpGedView/calendar.php. I am going to try the > Disallow just on that URL. > > It makes since it would get stuck on the calendar URL. There is no > end of dates to query for. > > > Matthew Gates wrote: > >> On Saturday 27 August 2005 18:29, Joe Tellup wrote: >> >>> Me too. this past week google used up 2.3 gig of my bandwidth. >>> >>> Put this in you robots text file and they are gone >>> >>> User-agent: googlebot >>> Disallow: / >>> >> >> >> I found that using robots.txt to partially block bots it quite >> effective at reducing traffic while still permitting your site to be >> indexed effectively. >> The trick is to find which files on the site are causing most traffic >> and block just those. I also blocked all the charts. My idea is >> that I want there to be at least one entry point to my site for each >> name and/or place in the database. >> >> Here's the robots.txt (my phpGedView installation is in the directory >> "gedview"): >> >> User-agent: * >> Disallow: /gedview/media/ >> Disallow: /gedview/timeline.php >> Disallow: /gedview/fanchart.php >> Disallow: /gedview/pedigree.php >> Disallow: /gedview/clippings.php >> Disallow: /gedview/family.php >> Disallow: /gedview/ancestry.php >> Disallow: /gedview/descendancy.php >> Disallow: /gedview/reportengine.php >> Disallow: /gedview/hourglass.php >> Disallow: /gedview/calendar.php >> Disallow: /gedview/patriarchlist.php >> >> Further tuning may be helped by this command which shows where the >> crawlers are makking most traffic. "access_log" is name of the >> apache log file - run this command in the web log directory. If you >> don't have an ssh login to your web host, copy the web log to youor >> local linux machine. If you don't have a linux machine, install cygwin! >> grep Googlebot access_log \ >> |grep gedview >> |awk -F\" '{ print $2 }' \ >> |awk '-F[/?]' '{ print $3 }' \ >> |sort | uniq -c >> >> The "grep gedview" will need to be changed for your site to filter >> only the pages for phpGedView (if youo have other parts to your >> website). >> >> The $3 on the fourth line of the command might need to be changed if >> your phpGedView install isn't in a sub-directory of the root of your >> web server. Mine (/gedview) is a single subdirectory. If yours is >> in a second level directory (e.g. /stuff/gedview) you need to change >> the $3 to $4. >> >> The output looks something like this: >> 1188 indilist.php >> 1109 placelist.php >> 527 individual.php >> 337 aliveinyear.php >> 73 repo.php >> 16 repolist.php >> 5 famlist.php >> 4 HTTP >> 1 relationship.php HTTP >> >> >> >>> Google is now making mirror images of all websites, I guess they are >>> going for a stock split. >> >> >> >> How can you tell? >> >> >>> -----Original Message----- >>> From: php...@li... >>> [mailto:php...@li...]On Behalf Of Ken >>> Lowther >>> Sent: Saturday, August 27, 2005 1:52 PM >>> To: php...@li... >>> Subject: [Phpgedview-talk] Google >>> >>> >>> Googlebot has been pounding my site. >>> >>> http://genealogy.lowther.org/cgi-bin/awstats.pl >>> >>> 330 individuals on file: >>> >>> http://genealogy.lowther.org/ >>> >>> Anyone else had this problem? The only thing I can think is that is >>> chasing links around in circles. Googlebot was connecte 24/7 using a >>> MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was >>> experiencing times when the machine reponded to very litte so that is >>> what started me digging. >>> >>> Ken >>> > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Phpgedview-talk mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > |