The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.
License
GNU Library or Lesser General Public License version 2.0 (LGPLv2)Follow DeDuplicator (Heritrix add-on)
Other Useful Business Software
Gemini 3 and 200+ AI Models on One Platform
Build generative AI apps with Vertex AI. Switch between models without switching platforms.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of DeDuplicator (Heritrix add-on)!