From: Victor Stone <fourstones.net@gm...> - 2006-09-25 05:05:27
I recently checked in a file called cc-gen-robots-txt.php to the bin
sub-directory -- this is our repository for helper tools mainly for
people who build releases but also some possibly useful tools for
maintainers of cchost sites.
I noticed this was a problem on ccMixter because there was a *huge*
amount of hits on worthless "pages." A few days after creating this
file for ccMixter I checked our web stats at Google's very handy
webmaster's tools ( http://www.google.com/webmasters/ ) and discovered
that I had saved over 17,000 (!) crawls down blind alleys. That's just
The script I checked in is a command line php utility that will
generate a robots.txt file ( http://www.google.com/search?q=robots.txt
) to discourage bots from crawling down ccHost command paths that have
nothing interesting to be indexed. It does this for every virtual you
have and creates a list of useless URLs (
Needless to say I recommend running this utility on every installation.