The project is a continuation of the old dupmerge program. dupmerge securely searches for files (of size >0) with equal content and if it finds such files, they get hard linked for saving disk space. It also has a sparse mode, a deletion mode and an inve
Be the first to post a text review of dupmerge2. Rate and review a project by clicking thumbs up or thumbs down in the right column.
The original dupmerge author, Phil Karn, has rewritten the original version from scratch, with a new algorithm: http://www.ka9q.net/code/dupmerge/ . So there are now, since december 2008, two actual versions (forks) of dupmerge: The karn and the freitag version. You can also find this dupmerge in the grml repository: http://deb.grml.org/pool/main/d/dupmerge/
I found out that dupmerge can reach the system limit of hard links while dupmerging database directories (dupmerge output): ... lstat(./var/lib/pgsql/data/base/2517760/3054028) failed lstat: No such file or directory Files linked: 208674 of 959741, Disk blocks reclaimed: 27286496 Minimum of found hard links: 1, Maximum: 32000. When the hard link limit (of 32000 under ext3/Debian Lenny) is reached, inside dupmerge the unlink of the duplicate file works, but the hard linking does not. So when the limit is reached, the duplicate files get lost! I'm working on a version with "ln -f" which should preserve the duplicates when the limit is reached, but future versions should find out the system limits and should merge the duplicates behind the limit to a second file till the system limit gets reached a second time and so on. Because that's not easy, because of additional data structures, checks etc. and because more than 32000 equal files are uncommon, i will put it on the todo list.
changelog.txt from dupmerge =========================== Circa 1993: Initial version, Phil Karn, karn (at) ka9q (dot) net. 1998-02-12: Last version from Phil Karn. 2004-12-07: Version 1.1 Added swap macro, invers mode, no replacing of zero length files, void casts for semantic checkers, version number, sorting of equal size files due to dev and ino and name, ... Switched to C99. Tested with SuSE 9.2 and Debian (both Kernel 2.6), tested with 4 and 7 Gigabyte files, coreutils sources, md5-collision files, changed to reading/writing/comparing 8 byte blocks (instead of 1 byte blocks): 2 times faster now. Added Todo-List. Rolf Freitag, rolf.freitag at email.de 2004-12-01: Version 1.21 Bugfix, because in the old verson 1.1 there are two missing braces. These missing braces caused that files of same size where compared only in the first 64 bits so some different files where linked together from that version! So all future releases and cvs versions will be tested before release/commit with the coreutils-5.2.1 sources to assert that in new versions there will be no (new) errors. These are the test results from du -sk: before dupmerge: 31628 after original version 1.0 from Phil Karn: 29968 after version 1.1: 29712 after actual version 1.21: 29968 Rolf Freitag 2005-02-10: Version 1.3 This version has a help and a fully tested sparse mode which replaces each file which can be shrinked by sparse copying. The inverse nomal mode now expands all hard links in O(n) (was O(n*log(n))) and has a new combo mode in which first all files are replaced by their sparse copy if it is smaller and after that files of size > 0 with the same content get hard linked to the oldest file with lowest disk usage (the eldest of the most sparse files). With the -i option the inverse will be done. This version now "compresses" as much as theoretical possible by removing redundancy with linking and sparsing. It saves approx. 20 % of disk space. Rolf Freitag 2005-04-17: Version 1.4 Added deletion mode (option -d), which deletes multiple files. This mode can be used e. g. for clearing movie and picture archives. Rolf Freitag 2005-04-27: Version 1.5 Because the old versions of dupmerge do use fgets to read the filenames, filenames with newlines can not be read. Therefore fgets is replaced since version 1.5 by fread and the old sequence operator '\n' is replaced by '\0'; the input file names must now be separated with zero and not newline. This is now perfect because a file name is a zero-terminated string. So instead of find ./ -type f -print | dupmerge <options or not> now you have to call find ./ -type f -print0 | dupmerge <options or not> Bugfix of the wrong version number. Tested with several files with several newlines in them. Rolf Freitag 2005-08-31: Version 1.6 Added changelog and several file checks because fread does not distinguish between end-of-file and error. Fixed not quite "started" message in quiet mode. Updated header. Rolf Freitag 2007-10-29: Version 1.7 Added date and time to startup message. That's important in case of filesystem errors like directory loops and an infinit looping find. Added const qualifiers to the two main arguments as minimal write protection. Added volatile qualifier to the i_exit_flag for safe ipc. Added s option for doing soft linking instead of hard linking, but a) it needs several runs to make all links and b) the inverse is not implemented yet. Added Cygwin support simply via #ifndev __CYGWIN__; it can now be compiled under Cygwin without modification. Tested successfull several times with coreutilities and other stuff. Dr. Rolf Freitag 2008-03-01: Version 1.73 Added some comments, checked with coreutils-5.2.1 (find ./ -type f -print0 | dupmerge) and my archive (about 500 GB in one million files on an encrypted partition). Dr. Rolf Freitag
I made some minor changes, added some comments and checked with coreutils-5.2.1 (find ./ -type f -print0 | dupmerge) and my archive (about 500 GB in one million files on an encrypted partition). In the release (zip) is an executable (dupmerge: sticky ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.4, dynamically linked (uses shared libs), for GNU/Linux 2.6.4, stripped).
Be the first person to add a text review.
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use
Thanks for your rating!
Would you also like to write a review?
Thanks for your review!
Get credit for your review by logging in via OpenID. Click your account provider: