The project is a continuation of the old dupmerge program. dupmerge securely searches for files (of size >0) with equal content and if it finds such files, they get hard linked for saving disk space. It also has a sparse mode, a deletion mode and an
dupmerge overview ================= Dupmerge reads a list of files from standard input (eg., as produced by "find . -print") and looks securely for identical files. When it finds two or more identical files, all but one are unlinked to reclaim the disk space and recreated as hard links to the remaining copy. Remarks: dumpmerge should be used only for backups or archives, where duplicate files are not needed; it should not be used without nodo mode for /home, /tmp, /var and most other directories. The normal mode, hard linking of multiple files, causes no problems in backups or archives and can also be used on CDs/DVDs. On filesystems without hard links, e. g. FAT (FAT12, FAT16, FAT32, VFAT ...), it can work only with soft links (often called shortcuts). The sparse mode never causes problems (on file systems which support sparse). The deletion mode can cause trouble e. g. with ebooks or html documents with pictures which are multiple. Therefore the deletion mode should only be used with files which are not assoziated, e. g. audio or video files. The deletion mode works on all (writable) file systems. Normal mode: Saves approx. 20 % space. Sparse mode: Saves approx. 0.2 % space. Deletion mode: Deletes approx. 10 % of the files. Many similar programs can be found on freshmeat.net or sourceforge.net by searching for duplicate. I found clink, dmerge, duff, Dupseek, epac, fdf, fdfind, fdupe, fdupes, find_duplicates, freedup, freedups, fslint, ftwin, highlnk, WeedIt, and whatpix. Most of these programs are not secure: highlnk and FSlint do use md5sum which is a cryptografical weak hash and therefore they are vunerable to md5sum collsions. With the hashing they are fast (O(n)) but not safe. Another point is handling files as zero-terminated strings to avoid problems with stray filenames, which is done correct from dupmerge.