From: Shyket, H. <har...@ya...> - 2011-11-15 15:10:34
|
Maybe it would be possible to block the search engines from saving session data. We could possibly check the hostname and if it matches GoogleBot, Yahoo or Bing, stop it from saving to session. Harry Shyket Digital Media Specialist Yale University Peabody Museum ph. 203-436-9428 har...@ya... From: William Piel [mailto:wil...@ya...] Sent: Tuesday, November 15, 2011 9:54 AM To: TreeBASE devel Subject: Re: [Treebase-devel] Fwd: Treebase Dev problems On Nov 15, 2011, at 9:35 AM, Rutger Vos wrote: We want search engines to crawl as much of the production server as we can manage (but remember when the google bot brought it to its knees) but none of dev and stage - that would only lead to inconsistent search results. Ideally we would let the bots crawl a site map with the purls so that it is those that are in the search results (though I wonder whether a bot would use the purl as the address or whatever that purl forwards to). yes, currently we publish a site map for Google -- which is why this query results in over 60,000 hits: http://www.google.com/search?client=safari&rls=en&q=site:treebase.org Google finds 4,000 hits using this one: http://www.google.com/search?client=safari&rls=en&q=site:treebase-dev.nescent.org which is probably done by indexing without the guidance of our site map. So, bad Google. Definitely, let's make dev and stage off-limits to search robots. bp |