Thread: [Phpgedview-talk] Google
Brought to you by:
canajun2eh,
yalnifj
From: Ken L. <klo...@ci...> - 2005-08-27 17:52:38
|
Googlebot has been pounding my site. http://genealogy.lowther.org/cgi-bin/awstats.pl 330 individuals on file: http://genealogy.lowther.org/ Anyone else had this problem? The only thing I can think is that is chasing links around in circles. Googlebot was connecte 24/7 using a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was experiencing times when the machine reponded to very litte so that is what started me digging. Ken |
From: Len <llu...@in...> - 2005-08-27 18:00:20
|
Yeah, But much worse from inktomisearch virtually all hits were from search engines Finally killed them by putting robots.txt in the directory running Phpgedview. in my case it was the "genealogy" Len -----Original Message----- From: php...@li... [mailto:php...@li...]On Behalf Of Ken Lowther Sent: Saturday, August 27, 2005 11:52 AM To: php...@li... Subject: [Phpgedview-talk] Google Googlebot has been pounding my site. http://genealogy.lowther.org/cgi-bin/awstats.pl 330 individuals on file: http://genealogy.lowther.org/ Anyone else had this problem? The only thing I can think is that is chasing links around in circles. Googlebot was connecte 24/7 using a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was experiencing times when the machine reponded to very litte so that is what started me digging. Ken ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Joe T. <jo...@te...> - 2005-08-27 18:31:20
|
I blocked google totally from my server. I don't worry about showing up on them because they will pick up our site from all the other web crawlers, they are pounding them too. -----Original Message----- From: php...@li... [mailto:php...@li...]On Behalf Of Len Sent: Saturday, August 27, 2005 2:00 PM To: php...@li... Subject: RE: [Phpgedview-talk] Google Yeah, But much worse from inktomisearch virtually all hits were from search engines Finally killed them by putting robots.txt in the directory running Phpgedview. in my case it was the "genealogy" Len -----Original Message----- From: php...@li... [mailto:php...@li...]On Behalf Of Ken Lowther Sent: Saturday, August 27, 2005 11:52 AM To: php...@li... Subject: [Phpgedview-talk] Google Googlebot has been pounding my site. http://genealogy.lowther.org/cgi-bin/awstats.pl 330 individuals on file: http://genealogy.lowther.org/ Anyone else had this problem? The only thing I can think is that is chasing links around in circles. Googlebot was connecte 24/7 using a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was experiencing times when the machine reponded to very litte so that is what started me digging. Ken ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Joe T. <jo...@te...> - 2005-08-27 18:30:09
|
Me too. this past week google used up 2.3 gig of my bandwidth. Put this in you robots text file and they are gone User-agent: googlebot Disallow: / Google is now making mirror images of all websites, I guess they are going for a stock split. -----Original Message----- From: php...@li... [mailto:php...@li...]On Behalf Of Ken Lowther Sent: Saturday, August 27, 2005 1:52 PM To: php...@li... Subject: [Phpgedview-talk] Google Googlebot has been pounding my site. http://genealogy.lowther.org/cgi-bin/awstats.pl 330 individuals on file: http://genealogy.lowther.org/ Anyone else had this problem? The only thing I can think is that is chasing links around in circles. Googlebot was connecte 24/7 using a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was experiencing times when the machine reponded to very litte so that is what started me digging. Ken ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Phpgedview-talk mailing list Php...@li... https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Matthew G. <ma...@po...> - 2005-08-28 12:45:24
|
On Saturday 27 August 2005 18:29, Joe Tellup wrote: > Me too. this past week google used up 2.3 gig of my bandwidth. > > Put this in you robots text file and they are gone > > User-agent: googlebot > Disallow: / > I found that using robots.txt to partially block bots it quite effective at reducing traffic while still permitting your site to be indexed effectively. The trick is to find which files on the site are causing most traffic and block just those. I also blocked all the charts. My idea is that I want there to be at least one entry point to my site for each name and/or place in the database. Here's the robots.txt (my phpGedView installation is in the directory "gedview"): User-agent: * Disallow: /gedview/media/ Disallow: /gedview/timeline.php Disallow: /gedview/fanchart.php Disallow: /gedview/pedigree.php Disallow: /gedview/clippings.php Disallow: /gedview/family.php Disallow: /gedview/ancestry.php Disallow: /gedview/descendancy.php Disallow: /gedview/reportengine.php Disallow: /gedview/hourglass.php Disallow: /gedview/calendar.php Disallow: /gedview/patriarchlist.php Further tuning may be helped by this command which shows where the crawlers are makking most traffic. "access_log" is name of the apache log file - run this command in the web log directory. If you don't have an ssh login to your web host, copy the web log to youor local linux machine. If you don't have a linux machine, install cygwin! grep Googlebot access_log \ |grep gedview |awk -F\" '{ print $2 }' \ |awk '-F[/?]' '{ print $3 }' \ |sort | uniq -c The "grep gedview" will need to be changed for your site to filter only the pages for phpGedView (if youo have other parts to your website). The $3 on the fourth line of the command might need to be changed if your phpGedView install isn't in a sub-directory of the root of your web server. Mine (/gedview) is a single subdirectory. If yours is in a second level directory (e.g. /stuff/gedview) you need to change the $3 to $4. The output looks something like this: 1188 indilist.php 1109 placelist.php 527 individual.php 337 aliveinyear.php 73 repo.php 16 repolist.php 5 famlist.php 4 HTTP 1 relationship.php HTTP > Google is now making mirror images of all websites, I guess they are > going for a stock split. How can you tell? > > -----Original Message----- > From: php...@li... > [mailto:php...@li...]On Behalf Of Ken > Lowther > Sent: Saturday, August 27, 2005 1:52 PM > To: php...@li... > Subject: [Phpgedview-talk] Google > > > Googlebot has been pounding my site. > > http://genealogy.lowther.org/cgi-bin/awstats.pl > > 330 individuals on file: > > http://genealogy.lowther.org/ > > Anyone else had this problem? The only thing I can think is that is > chasing links around in circles. Googlebot was connecte 24/7 using a > MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was > experiencing times when the machine reponded to very litte so that is > what started me digging. > > Ken > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices Agile & Plan-Driven Development * Managing Projects & Teams * > Testing & QA Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ > Phpgedview-talk mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices Agile & Plan-Driven Development * Managing Projects & Teams * > Testing & QA Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ > Phpgedview-talk mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Daniel P. K. <da...@ki...> - 2005-08-28 16:49:52
|
I have been indexed by Google for a long time, but in August my usage jumped to 4X normal. It is good to be indexed--a 3rd cousin just found my site, and now she is helping out on that branch of the tree. I looked at my usage by URL, and I was surprised to see that 64% of the hits are on phpGedView/calendar.php. I am going to try the Disallow just on that URL. It makes since it would get stuck on the calendar URL. There is no end of dates to query for. Matthew Gates wrote: > On Saturday 27 August 2005 18:29, Joe Tellup wrote: > >>Me too. this past week google used up 2.3 gig of my bandwidth. >> >>Put this in you robots text file and they are gone >> >>User-agent: googlebot >>Disallow: / >> > > > I found that using robots.txt to partially block bots it quite effective > at reducing traffic while still permitting your site to be indexed > effectively. > > The trick is to find which files on the site are causing most traffic and > block just those. I also blocked all the charts. My idea is that I want > there to be at least one entry point to my site for each name and/or > place in the database. > > Here's the robots.txt (my phpGedView installation is in the directory > "gedview"): > > User-agent: * > Disallow: /gedview/media/ > Disallow: /gedview/timeline.php > Disallow: /gedview/fanchart.php > Disallow: /gedview/pedigree.php > Disallow: /gedview/clippings.php > Disallow: /gedview/family.php > Disallow: /gedview/ancestry.php > Disallow: /gedview/descendancy.php > Disallow: /gedview/reportengine.php > Disallow: /gedview/hourglass.php > Disallow: /gedview/calendar.php > Disallow: /gedview/patriarchlist.php > > Further tuning may be helped by this command which shows where the > crawlers are makking most traffic. "access_log" is name of the apache > log file - run this command in the web log directory. If you don't have > an ssh login to your web host, copy the web log to youor local linux > machine. If you don't have a linux machine, install cygwin! > > grep Googlebot access_log \ > |grep gedview > |awk -F\" '{ print $2 }' \ > |awk '-F[/?]' '{ print $3 }' \ > |sort | uniq -c > > The "grep gedview" will need to be changed for your site to filter only > the pages for phpGedView (if youo have other parts to your website). > > The $3 on the fourth line of the command might need to be changed if your > phpGedView install isn't in a sub-directory of the root of your web > server. Mine (/gedview) is a single subdirectory. If yours is in a > second level directory (e.g. /stuff/gedview) you need to change the $3 to > $4. > > The output looks something like this: > 1188 indilist.php > 1109 placelist.php > 527 individual.php > 337 aliveinyear.php > 73 repo.php > 16 repolist.php > 5 famlist.php > 4 HTTP > 1 relationship.php HTTP > > > >>Google is now making mirror images of all websites, I guess they are >>going for a stock split. > > > How can you tell? > > >>-----Original Message----- >>From: php...@li... >>[mailto:php...@li...]On Behalf Of Ken >>Lowther >>Sent: Saturday, August 27, 2005 1:52 PM >>To: php...@li... >>Subject: [Phpgedview-talk] Google >> >> >>Googlebot has been pounding my site. >> >>http://genealogy.lowther.org/cgi-bin/awstats.pl >> >>330 individuals on file: >> >>http://genealogy.lowther.org/ >> >>Anyone else had this problem? The only thing I can think is that is >>chasing links around in circles. Googlebot was connecte 24/7 using a >>MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was >>experiencing times when the machine reponded to very litte so that is >>what started me digging. >> >>Ken >> |
From: John <sh...@jb...> - 2005-08-28 15:00:54
|
On 28 Aug 2005 at 13:45, Matthew Gates wrote: > On Saturday 27 August 2005 18:29, Joe Tellup wrote: > > Me too. this past week google used up 2.3 gig of my bandwidth. > > > > Put this in you robots text file and they are gone > > > > User-agent: googlebot > > Disallow: / > > Actually, I went one step further and locked-out any access to my web root. I do not allow any files, reads or writes either. Everything is placed into domain subdirectories. The plus side was that it cut-down most 'hack' attempts. I really don't care to be index by the search engines, so this works for me - I'm not commercial. I'll guess that placing a robot.txt file in my root will allow those search bots that play nice proper access, and the stripper bots/hot linkers nothing. > I found that using robots.txt to partially block bots it quite > effective at reducing traffic while still permitting your site to be > indexed effectively. > > The trick is to find which files on the site are causing most traffic > and block just those. I also blocked all the charts. My idea is that > I want there to be at least one entry point to my site for each name > and/or place in the database. > > Here's the robots.txt (my phpGedView installation is in the directory > "gedview"): > > User-agent: * > Disallow: /gedview/media/ > Disallow: /gedview/timeline.php > Disallow: /gedview/fanchart.php > Disallow: /gedview/pedigree.php > Disallow: /gedview/clippings.php > Disallow: /gedview/family.php > Disallow: /gedview/ancestry.php > Disallow: /gedview/descendancy.php > Disallow: /gedview/reportengine.php > Disallow: /gedview/hourglass.php > Disallow: /gedview/calendar.php > Disallow: /gedview/patriarchlist.php > > Further tuning may be helped by this command which shows where the > crawlers are makking most traffic. "access_log" is name of the apache > log file - run this command in the web log directory. If you don't > have an ssh login to your web host, copy the web log to youor local > linux machine. If you don't have a linux machine, install cygwin! > > grep Googlebot access_log \ > |grep gedview > |awk -F\" '{ print $2 }' \ > |awk '-F[/?]' '{ print $3 }' \ > |sort | uniq -c > > The "grep gedview" will need to be changed for your site to filter > only the pages for phpGedView (if youo have other parts to your > website). > > The $3 on the fourth line of the command might need to be changed if > your phpGedView install isn't in a sub-directory of the root of your > web server. Mine (/gedview) is a single subdirectory. If yours is in > a second level directory (e.g. /stuff/gedview) you need to change the > $3 to $4. > > The output looks something like this: > 1188 indilist.php > 1109 placelist.php > 527 individual.php > 337 aliveinyear.php > 73 repo.php > 16 repolist.php > 5 famlist.php > 4 HTTP > 1 relationship.php HTTP > > > > Google is now making mirror images of all websites, I guess they are > > going for a stock split. > > How can you tell? > Actually, I think it maybe more of an index size war. Notices how Yahoo and Google have really been publishing their index sizes here of late. And with Microsoft now joining the fray, I think its only going to get worse. > > -----Original Message----- > > From: php...@li... > > [mailto:php...@li...]On Behalf Of Ken > > Lowther Sent: Saturday, August 27, 2005 1:52 PM To: > > php...@li... Subject: [Phpgedview-talk] > > Google > > > > > > Googlebot has been pounding my site. > > > > http://genealogy.lowther.org/cgi-bin/awstats.pl > > > > 330 individuals on file: > > > > http://genealogy.lowther.org/ > > > > Anyone else had this problem? The only thing I can think is that is > > chasing links around in circles. Googlebot was connecte 24/7 using > > a MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was > > experiencing times when the machine reponded to very litte so that > > is what started me digging. > > > > Ken > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > > Practices Agile & Plan-Driven Development * Managing Projects & > > Teams * Testing & QA Security * Process Improvement & Measurement * > > http://www.sqe.com/bsce5sf > > _______________________________________________ Phpgedview-talk > > mailing list Php...@li... > > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > > > > > > ------------------------------------------------------- > > SF.Net email is Sponsored by the Better Software Conference & EXPO > > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > > Practices Agile & Plan-Driven Development * Managing Projects & > > Teams * Testing & QA Security * Process Improvement & Measurement * > > http://www.sqe.com/bsce5sf > > _______________________________________________ Phpgedview-talk > > mailing list Php...@li... > > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices Agile & Plan-Driven Development * Managing Projects & Teams > * Testing & QA Security * Process Improvement & Measurement * > http://www.sqe.com/bsce5sf > _______________________________________________ Phpgedview-talk > mailing list Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk |
From: Ken L. <klo...@ci...> - 2005-08-28 16:59:28
|
OK. Makes sense. My error log was getting a lot of errors for the calender. Ken Daniel P. Kionka wrote: > I have been indexed by Google for a long time, but in August my usage > jumped to 4X normal. It is good to be indexed--a 3rd cousin just > found my site, and now she is helping out on that branch of the tree. > > I looked at my usage by URL, and I was surprised to see that 64% of > the hits are on phpGedView/calendar.php. I am going to try the > Disallow just on that URL. > > It makes since it would get stuck on the calendar URL. There is no > end of dates to query for. > > > Matthew Gates wrote: > >> On Saturday 27 August 2005 18:29, Joe Tellup wrote: >> >>> Me too. this past week google used up 2.3 gig of my bandwidth. >>> >>> Put this in you robots text file and they are gone >>> >>> User-agent: googlebot >>> Disallow: / >>> >> >> >> I found that using robots.txt to partially block bots it quite >> effective at reducing traffic while still permitting your site to be >> indexed effectively. >> The trick is to find which files on the site are causing most traffic >> and block just those. I also blocked all the charts. My idea is >> that I want there to be at least one entry point to my site for each >> name and/or place in the database. >> >> Here's the robots.txt (my phpGedView installation is in the directory >> "gedview"): >> >> User-agent: * >> Disallow: /gedview/media/ >> Disallow: /gedview/timeline.php >> Disallow: /gedview/fanchart.php >> Disallow: /gedview/pedigree.php >> Disallow: /gedview/clippings.php >> Disallow: /gedview/family.php >> Disallow: /gedview/ancestry.php >> Disallow: /gedview/descendancy.php >> Disallow: /gedview/reportengine.php >> Disallow: /gedview/hourglass.php >> Disallow: /gedview/calendar.php >> Disallow: /gedview/patriarchlist.php >> >> Further tuning may be helped by this command which shows where the >> crawlers are makking most traffic. "access_log" is name of the >> apache log file - run this command in the web log directory. If you >> don't have an ssh login to your web host, copy the web log to youor >> local linux machine. If you don't have a linux machine, install cygwin! >> grep Googlebot access_log \ >> |grep gedview >> |awk -F\" '{ print $2 }' \ >> |awk '-F[/?]' '{ print $3 }' \ >> |sort | uniq -c >> >> The "grep gedview" will need to be changed for your site to filter >> only the pages for phpGedView (if youo have other parts to your >> website). >> >> The $3 on the fourth line of the command might need to be changed if >> your phpGedView install isn't in a sub-directory of the root of your >> web server. Mine (/gedview) is a single subdirectory. If yours is >> in a second level directory (e.g. /stuff/gedview) you need to change >> the $3 to $4. >> >> The output looks something like this: >> 1188 indilist.php >> 1109 placelist.php >> 527 individual.php >> 337 aliveinyear.php >> 73 repo.php >> 16 repolist.php >> 5 famlist.php >> 4 HTTP >> 1 relationship.php HTTP >> >> >> >>> Google is now making mirror images of all websites, I guess they are >>> going for a stock split. >> >> >> >> How can you tell? >> >> >>> -----Original Message----- >>> From: php...@li... >>> [mailto:php...@li...]On Behalf Of Ken >>> Lowther >>> Sent: Saturday, August 27, 2005 1:52 PM >>> To: php...@li... >>> Subject: [Phpgedview-talk] Google >>> >>> >>> Googlebot has been pounding my site. >>> >>> http://genealogy.lowther.org/cgi-bin/awstats.pl >>> >>> 330 individuals on file: >>> >>> http://genealogy.lowther.org/ >>> >>> Anyone else had this problem? The only thing I can think is that is >>> chasing links around in circles. Googlebot was connecte 24/7 using a >>> MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was >>> experiencing times when the machine reponded to very litte so that is >>> what started me digging. >>> >>> Ken >>> > > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Phpgedview-talk mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > |
From: Lars E. B. <lar...@da...> - 2005-08-28 19:01:16
|
KL> OK. Makes sense. My error log was getting a lot of errors for the KL> calender. Could a: <META NAME="ROBOTS" CONTENT="NOFOLLOW"> be implemented just for the calendar.php script? My web host pulled the plug on me last Sunday because of alleged bandwidth abuse. I would like to be indexed by Google, but not at the expense of my entire site... Lars Erik Bryld le...@da... http://www.bryld.suite.dk |
From: Keith C. <ke...@dr...> - 2005-08-28 03:02:10
|
Yes, and it happened only on my Phpgedview web site: I posted about it on my hosting forums here: http://forums.hostdime.com/showthread.php?p=26699#post26699 Ken Lowther wrote: > Googlebot has been pounding my site. > > http://genealogy.lowther.org/cgi-bin/awstats.pl > > 330 individuals on file: > > http://genealogy.lowther.org/ > > Anyone else had this problem? The only thing I can think is that is > chasing links around in circles. Googlebot was connecte 24/7 using a > MINUMUM of 20% processor on a dual amd 64 2.3 ghz machine. I was > experiencing times when the machine reponded to very litte so that is > what started me digging. > > Ken > > > ------------------------------------------------------- > SF.Net email is Sponsored by the Better Software Conference & EXPO > September 19-22, 2005 * San Francisco, CA * Development Lifecycle > Practices > Agile & Plan-Driven Development * Managing Projects & Teams * Testing > & QA > Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf > _______________________________________________ > Phpgedview-talk mailing list > Php...@li... > https://lists.sourceforge.net/lists/listinfo/phpgedview-talk > > > |