The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Project Activity

See All Activity >

License

GNU Library or Lesser General Public License version 2.0 (LGPLv2)

Follow DeDuplicator (Heritrix add-on)

DeDuplicator (Heritrix add-on) Web Site

You Might Also Like
Discover Multiview ERP: The Financial Management Revolution Icon
Discover Multiview ERP: The Financial Management Revolution

Reclaim precious moments with loved ones while our robust cloud accounting software streamlines your financial processes.

Built for growing businesses and well-established enterprises alike, Multiview is a highly scalable and robust ERP.
Rate This Project
Login To Rate This Project

User Reviews

There are no 5 star reviews.

Additional Project Details

Languages

English

Intended Audience

Advanced End Users, System Administrators, Developers

User Interface

Plugins

Programming Language

Java

Related Categories

Java Internet Software, Java Web Scrapers

Registered

2006-11-06