The crawler cockpit offers integrated features managed through a web
interface. It manages the main part of the Web Archiving process: creating and
launching campaigns, and viewing statistics about the crawls. The archivist
sets up the archiving campaign via the crawler cockpit GUI. A campaign
is described by an intelligent crawl definition, which associates content
target to crawl parameters (schedule and technical parameters). The content
definition is made of:
The crawl parameters describe the campaign schedule and define some technical
parameters such as politeness or robots.txt compliance. At the end of the
crawls, users get access to an overview of the data collected through different
widgets. The report part displays the available crawl metrics.
It is implemented in Python, uses the Pylons framework, jQuery and a Postgres
database. It is released under GPLv3.
As root (assuming you will install it as user arcomem):
mkdir -p /usr/opt/ tar xzf ~/cockpit.tgz chown -R arcomem:arcomem /usr/opt/cockpit/
Then, go to the install
directory and follow the instructions in README
after configuring sudo to let the user run commands as root, or run the `sudo
commands' as root, the other ones as the normal user.