From: Charles C. <ch...@ru...> - 2005-01-27 09:32:02
|
I just looked at the access log for my site and noticed that google was indexing it. And it was trying every possible link from every page! Would it make sense to add a "rel='nofollow'" to links such as "edit text" where it makes no sense for a robot to follow? regards, Charles |
From: Charles C. <ch...@ru...> - 2005-01-31 09:43:21
|
On Thu, January 27, 2005 17:31, Charles Corrigan said: > I just looked at the access log for my site and noticed > that google was indexing it. And it was trying every > possible link from every page! > > Would it make sense to add a "rel='nofollow'" to links > such as "edit text" where it makes no sense for a robot > to follow? When I wrote that, it was already 7 days after Reini had started putting the nofollow attribute onto some of the links! My only excuse (hah!) is that CVS and then the lists had problems last week. Anyway, I took a look and found that the changes did not work for me - I did not spend any time trying to work out why. I just put a couple of changes in that worked in my testing. /lib/Theme.php - line 1184 - 4 lines in Button->Button() // Google honors this if (in_array(strtolower($text), array('edit','create','diff','pdf')) and !$request->_user->isAuthenticated()) $this->setAttr('rel', 'nofollow'); replaced with // Google honors this $this->setAttr('rel', 'nofollow'); - line 1210 - 4 lines in ImageButton->ImageButton() // Google honors this if (in_array(strtolower($text), array('edit','create','diff','pdf')) and !$GLOBALS['request']->_user->isAuthenticated()) $this->setAttr('rel', 'nofollow'); replaced with // Google honors this $this->setAttr('rel', 'nofollow'); OK, some justification is required. In my opinion 1 - it does not matter whether the user is authenticated or not, only that if the user is google or another search engine, we do not want them to follow these links 2 - while there are some of the action buttons that it would be good for google to follow (RecentChanges, BackLinks), most of them I would prefer it did not. The downside of my changes is that there will be an unconditional additional 15 characters per action link in the HTML sent to the client. regards, Charles |
From: Reini U. <ru...@x-...> - 2005-01-31 12:11:58
|
Charles Corrigan schrieb: > On Thu, January 27, 2005 17:31, Charles Corrigan said: > >>I just looked at the access log for my site and noticed >>that google was indexing it. And it was trying every >>possible link from every page! >> >>Would it make sense to add a "rel='nofollow'" to links >>such as "edit text" where it makes no sense for a robot >>to follow? > > > When I wrote that, it was already 7 days after Reini had started putting > the nofollow attribute onto some of the links! My only excuse (hah!) is > that CVS and then the lists had problems last week. Does it work now? > Anyway, I took a look and found that the changes did not work for me - I > did not spend any time trying to work out why. I just put a couple of > changes in that worked in my testing. > > /lib/Theme.php > > - line 1184 - 4 lines in Button->Button() > // Google honors this > if (in_array(strtolower($text), > array('edit','create','diff','pdf')) > and !$request->_user->isAuthenticated()) > $this->setAttr('rel', 'nofollow'); > replaced with > // Google honors this > $this->setAttr('rel', 'nofollow'); > > - line 1210 - 4 lines in ImageButton->ImageButton() > // Google honors this > if (in_array(strtolower($text), > array('edit','create','diff','pdf')) > and !$GLOBALS['request']->_user->isAuthenticated()) > $this->setAttr('rel', 'nofollow'); > replaced with > // Google honors this > $this->setAttr('rel', 'nofollow'); > > > OK, some justification is required. In my opinion > 1 - it does not matter whether the user is authenticated or not, only that > if the user is google or another search engine, we do not want them to > follow these links Of course we could check for a search engine user_agent, but a simple $user->_level check is much cheaper. rel=nofollow could also be used in some special accessibility browser (speechmode, ...). > 2 - while there are some of the action buttons that it would be good for > google to follow (RecentChanges, BackLinks), most of them I would prefer > it did not. So we should explicitly allow RecentChanges, BackLinks and deny all others? This will cost about the same. > The downside of my changes is that there will be an unconditional > additional 15 characters per action link in the HTML sent to the client. -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/ |
From: Charles C. <ch...@ru...> - 2005-02-14 08:37:42
|
On Mon, January 31, 2005 17:43, Charles Corrigan said: > On Thu, January 27, 2005 17:31, Charles Corrigan said: >> I just looked at the access log for my site and noticed >> that google was indexing it. And it was trying every >> possible link from every page! >> >> Would it make sense to add a "rel='nofollow'" to links >> such as "edit text" where it makes no sense for a robot >> to follow? > > When I wrote that, it was already 7 days after Reini had started putting > the nofollow attribute onto some of the links! My only excuse (hah!) is > that CVS and then the lists had problems last week. Google just re-indexed my site and, again, followed all links, including action=edit etc. I looked into Google's spec and realised that the rel="nofollow" only means that the link does not contribute to pagerank. It does not mean that the link will not be followed. It looks like the only way to handle this is via the robots.txt. Google support an extension to the specification that allows wildcards to be specified in the Disallow field (see http://www.google.com/intl/en/webmasters/3.html ). The new contents of my robots.txt file follow, regards, Charles # robots.txt - Charles Corrigan - 14/2/2005 # This robots.txt file uses a non-standard format that is followed by # Google. This format allows the use of wildcards in the page names # and is used here to advise robots not to follow links in PhpWiki that # are not relevant. User-agent: * Disallow: /*action=chmod Disallow: /*action=chown Disallow: /*action=create Disallow: /*action=DebugInfo Disallow: /*action=diff Disallow: /*action=edit Disallow: /*action=EditMetaData Disallow: /*action=EditMetaInfo Disallow: /*action=loadfile Disallow: /*action=lock Disallow: /*action=PageDump Disallow: /*action=PageHistory Disallow: /*action=PageInfo Disallow: /*action=PhpWikiAdministration%2FChmod Disallow: /*action=PhpWikiAdministration%2FChown Disallow: /*action=PhpWikiAdministration%2FRemove Disallow: /*action=PhpWikiAdministration%2FRename Disallow: /*action=PhpWikiAdministration%2FReplace Disallow: /*action=PhpWikiAdministration%2FSetAcl Disallow: /*action=remove Disallow: /*action=rename Disallow: /*action=replace Disallow: /*action=setacl Disallow: /*action=TranslateText Disallow: /*action=unlock Disallow: /*action=upgrade Disallow: /*action=viewsource Disallow: /*action=zip Disallow: /*action=ziphtml |
From: Charles C. <ch...@ru...> - 2005-02-14 09:22:20
|
On Mon, February 14, 2005 17:13, someone asked: > Not > Disallow: *action=* In my opinion, no: 1 - no trailing * is required ;-) 2 - some actions, such as action=LikePages, action=RelatedChanges and, in particular, action=BackLinks make sense Please note that the specification for robots.txt state that robots will retrieve it from the root directory, i.e. /robots.txt - so what I sent in is a subset of the full robots.txt file for my site. regards, Charles |
From: Reini U. <ru...@x-...> - 2005-02-14 11:37:06
|
Charles Corrigan schrieb: > On Mon, January 31, 2005 17:43, Charles Corrigan said: >>On Thu, January 27, 2005 17:31, Charles Corrigan said: >>>I just looked at the access log for my site and noticed >>>that google was indexing it. And it was trying every >>>possible link from every page! >>> >>>Would it make sense to add a "rel='nofollow'" to links >>>such as "edit text" where it makes no sense for a robot >>>to follow? >> >>When I wrote that, it was already 7 days after Reini had started putting >>the nofollow attribute onto some of the links! My only excuse (hah!) is >>that CVS and then the lists had problems last week. > > Google just re-indexed my site and, again, followed all links, including > action=edit etc. I looked into Google's spec and realised that the > rel="nofollow" only means that the link does not contribute to pagerank. > It does not mean that the link will not be followed. Thanks for clarification! > It looks like the only way to handle this is via the robots.txt. Google > support an extension to the specification that allows wildcards to be > specified in the Disallow field (see > http://www.google.com/intl/en/webmasters/3.html ). We also use the robots meta tag, which says that those initial action links are followed, but subsequent links and indexing in the action page are forbidden. This should be easier to setup (cost: none) than using a hardcoded robots.txt file, but allows one more link. normal + RecentChanges: <meta name="robots" content="index,follow" /> most action pages: <meta name="robots" content="noindex,nofollow" /> PS: We might want to change action=BackLinks to the first rule, as another exeption along with RecentChanges. -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/ |