A Java Web crawler: multi-threaded, scalable, with high performance, extensible and polite. It can be used to crawl and index any web or enterprise domain and is configurable through a XML configuration file.
Toke is a webmining toolkit for web exploring, indexing and searching for Java. Toke allows to you crawl public or private web sites, in order to create web estatistics, web Pajek graphs, Lucene indexs and word frequency files for data clustering.
Sperowider Website Archiving Suite is a set of Java applications, the primary purpose of which is to spider dynamic websites, and to create static distributable archives with a full text search index usable by an associated Java applet.