A new Near Duplicate Detection Method
A toolkit for crawling information from web pages by combining different kinds of "actions". Actions are simple operations such as navigation to a specified url or extraction of text from the html. Also available is a graphic user interface.
Simple C# tool example project to scrape info from a webpage. This is a quick hack for a school project, done in one evening so I dont have to type the same printers into Excel or Access for the twentiest time ...
a simple bash script to harvest aleph x server. since some libraries dont have oai-pmh servers installed or configured, it turned out to be an option to harvest aleph x server to get libraries catalog data.