The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)Follow DeDuplicator (Heritrix add-on)
You Might Also Like
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of DeDuplicator (Heritrix add-on)!