The WAW tools provide a more automated approach to web harvesting, based on archival principles, automated process and human decision-making. The model seeks to use archival principles to preserve documents on the web.
Java program to extract postings and comments from http://www.livejournal.com (blog) into DB and view/classify/process it. LJ loader. Components to reuse: perl-like, but efficient Web pages scraper, trees analyzer, concurrent scheduler.