The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Project Activity

See All Activity >

License

GNU Library or Lesser General Public License version 2.0 (LGPLv2)

Follow DeDuplicator (Heritrix add-on)

DeDuplicator (Heritrix add-on) Web Site

You Might Also Like
AlertBot: Website Monitoring of Uptime, Performance, and Errors Icon
AlertBot: Website Monitoring of Uptime, Performance, and Errors

For IT Professionals and network adminstrators looking for a web application monitoring solution

AlertBot monitors your website's full functionality around the clock so you can focus your time on more important things.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DeDuplicator (Heritrix add-on)!

Additional Project Details

Languages

English

Intended Audience

Advanced End Users, System Administrators, Developers

User Interface

Plugins

Programming Language

Java

Related Categories

Java Internet Software, Java Web Scrapers

Registered

2006-11-06