Re: [Treebase-devel] Fwd: Treebase Dev problems

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Maybe it would be possible to block the search engines from saving session data.  We could possibly check the hostname and if it matches GoogleBot, Yahoo or Bing, stop it from saving to session.

Harry Shyket
Digital Media Specialist
Yale University Peabody Museum
ph. 203-436-9428
har...@ya...

From: William Piel [mailto:wil...@ya...]
Sent: Tuesday, November 15, 2011 9:54 AM
To: TreeBASE devel
Subject: Re: [Treebase-devel] Fwd: Treebase Dev problems

On Nov 15, 2011, at 9:35 AM, Rutger Vos wrote:

We want search engines to crawl as much of the production server as we can manage (but remember when the google bot brought it to its knees) but none of dev and stage - that would only lead to inconsistent search results. Ideally we would let the bots crawl a site map with the purls so that it is those that are in the search results (though I wonder whether a bot would use the purl as the address or whatever that purl forwards to).

yes, currently we publish a site map for Google -- which is why this query results in over 60,000 hits:

http://www.google.com/search?client=safari&rls=en&q=site:treebase.org

Google finds 4,000 hits using this one:

http://www.google.com/search?client=safari&rls=en&q=site:treebase-dev.nescent.org

which is probably done by indexing without the guidance of our site map.

So, bad Google.

Definitely, let's make dev and stage off-limits to search robots.

bp