Crawlzilla has 2 mode management:
1. Dialog management: It offers low-level nodes management, such as: (1)check cluster status (2)datanode&tasktracker node management (3)datanode&taskjacker management (4)tomcat management (5)change tomcat port.
2. Web interface management: It offers (1)crawl setup (2)search engine management (3)index pool management.
Crawlzilla usage procedure:
$ /home/crawler/crawlzilla/system/crawlzilla
Enables all nodes to run datanode & taskjacker.
When you first login web interface, it need to change administrator password.
Go to the "crawl page" and input 3 parameters:
1. Index Pool name: To identify this search engine and index pool
2. Crawl URLs: input which URLs you want to crawl (ex. https://sourceforge.net/p/crawlzilla/wiki)
3. Crawl depth: choose depth for these URLs