Menu

Cockpit

John Arcoman

Crawler cockpit

The crawler cockpit offers integrated features managed through a web
interface. It manages the main part of the Web Archiving process: creating and
launching campaigns, and viewing statistics about the crawls. The archivist
sets up the archiving campaign via the crawler cockpit GUI. A campaign
is described by an intelligent crawl definition, which associates content
target to crawl parameters (schedule and technical parameters). The content
definition is made of:

  • distinct named entities (e.g. person, location, and organisation), time
    period, free keywords
  • and a selection of up to nine social media categories;
  • some specific URLs defined in the control group;
  • target on specific media content categories;
  • type of data to collect.

The crawl parameters describe the campaign schedule and define some technical
parameters such as politeness or robots.txt compliance. At the end of the
crawls, users get access to an overview of the data collected through different
widgets. The report part displays the available crawl metrics.

It is implemented in Python, uses the Pylons framework, jQuery and a Postgres
database. It is released under GPLv3.

Set up

As root (assuming you will install it as user arcomem):

mkdir -p /usr/opt/
tar xzf ~/cockpit.tgz
chown -R arcomem:arcomem /usr/opt/cockpit/

Then, go to the install directory and follow the instructions in README
after configuring sudo to let the user run commands as root, or run the `sudo
commands' as root, the other ones as the normal user.


Related

Wiki: TryIt

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.