extract unique file sets from sets with duplicates
...That is what dupless does.
Written in Java, using sqlite, it is some simple code that solves the duplicate file problem.
All of the code is contained in the .jar file, both source and binary.
Currently it writes scripts for use on Linux or Windows.
See the Wiki or the README.txt in the .jar file for more information.
Defuddle is a data translation engine that supports mapping arbitrary ASCII and binary file formats to a data model defined in XML Schema in a manner similar to, but not compliant with the Data Format Description Language (http://www.ogf.org/dfdl/).