Disclaimer: This is a free software and no liability is taken up by manufacturer of this software for any damages caused by using this software.
NOTE (Limitations):
This has been tested only on ubuntu 11.04 with gcj-jdk installed (could work on debian based systems)
The Offline version is terribly slow (though works for few thousand files)
FEATURES:
Detects Duplicate Directory And Files as a shell script (lists topmost directories, non conflicting ordered deletion scripts )
Incremental Duplicate scanning (disabled in free version 0.6. Will be in ver 0.7)
Two Way Compare
Show duplicates in a second path with respect to first (Two Way Duplicate finder. Files/Dirs to be deleted will be only from second path- even if it is subdir of first path)
Copy Script generation for 2 way compare (This is terribly slow. Copies missing files from second path to first path. Disabled by default)
Automated Delete (this part is Open Source. Delete programs that run using script generated is open source. - Very useful if deleting millions of files/dirs for one way and two way comparison. -4 combinations)
Ordered by size, lists disk gain, pattern match, minsize threshold, onetime disk scan, safe script, timer, And timestamp backup of run logs safely (non conflicting dir/file deletions. file/dir deleted up in the order will not be deleted again. Backup logs will be db based in future)
exhaustive search (ignoring file names)
Partial match script (disabled in free version. Though program to find Super Directory is open source)
Archive file scan (disabled in free version. May be better done manual as 2 way compare)
script files could be as large as 500 MB, and number of files scanned could be in millions
CAUTION: THE CHOICE OF DIRECTORY DELETED IS ARBITARY. ESPECIALLY IN CASE OF CYCLIC SUB-DIRECTORY MATCH. MAX DAMAGE IS DIRS WILL END UP MISPLACED FROM WHERE IT IS SUPPOSED TO BE(NOTHING GETS LOST- THOUGH SOME SUBDIRS IN WRONG LOCATION MAY BE GIVEN MORE PREFERENCE TO BE RETAINED). AGAIN THIS TOOL HELPS MANUAL COMPARISON WITH "MELD" DIRECTORY COMPARISON TOOL FOR WHICH SCRIPTS ARE GENERATED(I RECOMMEND 'MELD' OVER 'KOMPARE')
Execution
Offline (slow - will be fast in later versions)
Do "ls -laR {pathname} > __{pathname}". Each "/" has to be replaced with double underscore "__". eg: "ls -laR /home/rekha > __home__rekha"
"./ofDupFind 1 __home__rekha" - this will generate dir deletion script "rm1.sh" , file deletion script "rm1_files.sh" in current dir, along with log/processing text files.
Verify the files- especially the second arguments on each line in the shell scripts(which will be deleted), and run the deletion program.
"java exec_rmdirs 1 >myLogDir" compares each directory listed in "rm1.sh", and deletes after a default pause of 10 seconds - deletes in a loop all the dirs on right hand side of "meld" lines in "rm1.sh". "java exec_rmdirs 1 0 0 0 >myLogDir" - deletes in a loop with no delay (look at the source code of 4 java programs)
"java exec_rmfiles 1 >myLogFile" compares each files listed in "rm1_files.sh", and deletes after a default pause of 10 seconds- deletes in a loop all the files on right hand side of "diff" lines in "rm1_files.sh". "java exec_rmfiles 1 0 0 0 >myLogFile" - deletes in a loop with no delay
For 2 way compare "./ofDupFind 2 __home__hema"(after doing "ls -laR /home/hema > __home__hema" ). This takes info of first directory "/home/rekha"(info that is present in current dir as hidden text files) and lists duplicates of "/home/hema" in a subdirectory named "second"
"cd second". This takes you to "second" directory , and run the scripts for deleting directories/files viz: "java -cp .. exec_rmdirs 2 >myLogDir", "java -cp .. exec_rmfiles 2 >myLogFile"(add 0 0 0 args for no pause . viz: "java -cp .. exec_rmdfiles 2 0 0 0 >myLogFile, java -cp .. exec_rmddirs 2 0 0 0 >myLogDir"). The corresponding shell script files to be verified before running are - "rm2_unsorted.sh" - for dirs and "rm2_files.sh" . (dirs/files listed on right hand side of "meld" or "diff" of respective shell scripts line as explained in step 3 will be deleted).
REPEAT STEPS 4 AND 5 FOR ANY NUMBER OF PATHS TO BE COMPARED WITH ORIGINAL
"./ofDupFind 1 __home__userABC" this BACKS UP ALL PREVIOUS 1 and 2 WAY RUNS in a "run{timestamp} directory recursively ", and STARTS A FRESH 1 WAY DUPLICATE FINDER TO START FROM STEP 1 THROUGH 6. -----REPEAT STEP 1-6-----
Direct (fast).
This is exact same as offline, but you give direct path to scan. Do "./ofDupFind 1 /home/hari"
Do "./ofDupFind 2 /media/mountedDir" for 2 way match in "second" subdir
Execute "java exec_rmdirs 1 0 0 0 > mylogDir", "java exec_rmfiles 1 0 0 0 > mylogFile", "java exec_rmdirs 2 0 0 0 > mylogDir", "java exec_rmfiles 2 0 0 0 > mylogFile" as appropriate, after verifying files for deletions.
Last edit: Hari's Sofwares 2012-09-09
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
for free java version: http://code.google.com/p/off-dup-finder
Disclaimer: This is a free software and no liability is taken up by manufacturer of this software for any damages caused by using this software.
NOTE (Limitations):
This has been tested only on ubuntu 11.04 with gcj-jdk installed (could work on debian based systems)
The Offline version is terribly slow (though works for few thousand files)
FEATURES:
Detects Duplicate Directory And Files as a shell script (lists topmost directories, non conflicting ordered deletion scripts )
Incremental Duplicate scanning (disabled in free version 0.6. Will be in ver 0.7)
Two Way Compare
Show duplicates in a second path with respect to first (Two Way Duplicate finder. Files/Dirs to be deleted will be only from second path- even if it is subdir of first path)
Copy Script generation for 2 way compare (This is terribly slow. Copies missing files from second path to first path. Disabled by default)
Automated Delete (this part is Open Source. Delete programs that run using script generated is open source. - Very useful if deleting millions of files/dirs for one way and two way comparison. -4 combinations)
Ordered by size, lists disk gain, pattern match, minsize threshold, onetime disk scan, safe script, timer, And timestamp backup of run logs safely (non conflicting dir/file deletions. file/dir deleted up in the order will not be deleted again. Backup logs will be db based in future)
exhaustive search (ignoring file names)
Partial match script (disabled in free version. Though program to find Super Directory is open source)
Archive file scan (disabled in free version. May be better done manual as 2 way compare)
script files could be as large as 500 MB, and number of files scanned could be in millions
CAUTION: THE CHOICE OF DIRECTORY DELETED IS ARBITARY. ESPECIALLY IN CASE OF CYCLIC SUB-DIRECTORY MATCH. MAX DAMAGE IS DIRS WILL END UP MISPLACED FROM WHERE IT IS SUPPOSED TO BE(NOTHING GETS LOST- THOUGH SOME SUBDIRS IN WRONG LOCATION MAY BE GIVEN MORE PREFERENCE TO BE RETAINED). AGAIN THIS TOOL HELPS MANUAL COMPARISON WITH "MELD" DIRECTORY COMPARISON TOOL FOR WHICH SCRIPTS ARE GENERATED(I RECOMMEND 'MELD' OVER 'KOMPARE')
Execution
Last edit: Hari's Sofwares 2012-09-09