As of this release, there are three ways to generate a
sitemap: specifying urls, specifying paths, or using logs.
However, I would image that many administrators will
use these methods to mimic exactly what their
robots.txt file specifies. Why not make this easier?
Something like this:
path="/var/www/html" bot="googlebot />
url: Address of robots.txt.
path: Root path of the site.
bot: Interest was shown in other search engines using
this software. This attribute will allow the sitemap
generation to follow the rules for a certain bot.
Values would include a bot name, or "*" to follow all
rules, regardless of the bot they are meant for.
This would essentially mimic a directory element, and a
few filter elements based on the rules within robots.txt.
Details will have to be ironed out, taking into account
aliased directories that a bot would see, but not
visible on the file system.
Thus, creation of the sitemap will follow the same
rules a bot would.