Methanol is a scriptable multi-purpose web crawling system with an extensible configuration system and speed-optimized architectural design. Methabot is the webcrawler of Methanol.
The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol.
This project has been incorporated into crawler-commons (https://github.com/crawler-commons/crawler-commons) and is no longer being maintained.