The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)Follow DeDuplicator (Heritrix add-on)
Other Useful Business Software
AI-powered service management for IT and enterprise teams
Give your IT, operations, and business teams the ability to deliver exceptional services—without the complexity. Maximize operational efficiency with refreshingly simple, AI-powered Freshservice.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of DeDuplicator (Heritrix add-on)!