From: Rui C. <rui...@ac...> - 2004-12-13 21:03:18
|
Well, actually it is. Otherwise, the Google bot will go in and fill the database with empty pages wherever I leave an unlinked WikiWord, and, in this particular case, _will_repeat_the_search_. If it's a FullTextSearch, it's a significant performance hit for me, since I'm not (yet) caching the results. Note that I only exclude the bots from _action_ pages and older content versions, not from the rest of the site. And I haven't noticed any significant drop in AdSense revenue (it isn't that much to begin with). Search hits are something like 0.005% of my total traffic. Plus, referrer checking will do nothing for this - as far as I can see, the bot does not send a referrer at all in the search case. I have very significant pieces of code devoted to doing referrer checking and spam filtering, so I would have noticed. Moving on, I have finally debugged the edit template for the Kubrick theme. I still have a bug someplace in the diff routines, but have cleaned up the templates a bit and will, like I promised, be sending you a zip with them - it will, however, require further cleanup, since my navigation bar is different from the standard one, etc. I think you'll find it useful, and I look forward to seeing it in new versions of PhpWiki. I will also try to clean up some of my custom plugins and send them later - I've just returned from vacation, and work is already impinging on me. Regards, R. http://the.taoofmac.com On Dec 8, 2004, at 2:17 PM, Reini Urban wrote: > Rui Carmo schrieb: >> Besides yesterday's fix, I spent quite some time figuring out why I >> still had another "hit" on the search pages. At first I thought it >> was something inside the search code, but when I started logging IP >> addresses it became obvious. >> It turned out that AdSense will trigger an _immediate_ hit from the >> google bot if I display ads on my *Search pages, which prompted me to >> include this little snippet right at the start of index.php (to waste >> as little resources as possible): >> define( "DUMB_BOTS", >> '/(JPluck|Mediapartners|ia_archiver|googlebot|msnbot|Crawl)/i' ); >> if( preg_match( DUMB_BOTS, $_SERVER['HTTP_USER_AGENT'] ) ) { >> if( preg_match( '/\?(s|action|version)=/', $_SERVER['REQUEST_URI'] >> ) ) { >> header( "HTTP/1.1 404 File Not Found" ); >> echo( "<H1>404 File Not Found</H1>" ); >> exit; >> } >> } > > That's not a good idea! > Google's Ad Sense checks where the ads really appear on the page, and > calculates the rank (= money!) from this info. > If you reject the checker you will get no profit from AdSense at all. > > rejected the main googlebot is also not a good idea. > The googlebot is a good thing. > You just have to prepare for being "slashdotted" to death once in a > while. Some referrer check or referrer throttling. > -- > Reini Urban > http://xarch.tu-graz.ac.at/home/rurban/ |