The DeDuplicator is an add-on module (plug-in) for the web crawler Heritrix. It offers a means to reduce the amount of duplicate data collected in a series of snapshot crawls.

Project Activity

See All Activity >

License

GNU Library or Lesser General Public License version 2.0 (LGPLv2)

Follow DeDuplicator (Heritrix add-on)

DeDuplicator (Heritrix add-on) Web Site

You Might Also Like
All-in-One Payroll and HR Platform Icon
All-in-One Payroll and HR Platform

For small and mid-sized businesses that need a comprehensive payroll and HR solution with personalized support

We design our technology to make workforce management easier. APS offers core HR, payroll, benefits administration, attendance, recruiting, employee onboarding, and more.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of DeDuplicator (Heritrix add-on)!

Additional Project Details

Languages

English

Intended Audience

Advanced End Users, System Administrators, Developers

User Interface

Plugins

Programming Language

Java

Related Categories

Java Internet Software, Java Web Scrapers

Registered

2006-11-06