Menu

Fixing duplicates caused from camera uploads

davea0511c
2011-02-28
2013-04-29
  • davea0511c

    davea0511c - 2011-02-28

    I've used this for years.  It puts powerful duff tools in the hands of the common folk - an ever increasing need with respect to the expanding use of personal electronics integration with online and at-home storage media.

    To be truly useful for the common folk though it needs a wizard frontend that would automatically select a certain configuration profile based on the task you want to perform. 

    For example a "fix lazy uploads" task (I think this would be the most common task):  The purpose is to move all the repeat-uploads from a camera or similar device into a "extraDuplicates" folder (repeat-uploads are caused by not deleting media from a device after previous uploads, and so the media gets re-uploaded later into a different folder).  In this task you'd have teh following configuration:
    - 1) "layer" tab: Select only "binary data". (incidentally, should be able to tweak the file comparison for lossy media like this to be 100x faster since you only need to lightly sample the files).
    - 2) "Files/Dirs" tab:  Automatically select common lossy media formats (jpg png mov avi, etc) while ignoring all others.
    - 3) "Markers" tab:  What we need is to mark all the duplicates except for each earliest found file instance.  This has some issues (see "Marking Method" in the ps section below)
    - 4) "Processors" tab:  Marked files are automatically moved to a folder called "extraDuplicates\" + optional identifier + "\", and within that folder a movedFiles.txt containing the names for the moved files, including where each file was moved from (making possible an easy "undo").

    I'm sure there are dozens of other profiles that can be created for the wizard.  The "Save Current Settings" currently doesn't work, but if idid then all the wizard is just a directory to pre-existing setting saves (it should include some default profiles like the above).  Note that each saved profile should also include a description field.

    The app also needs a icon toolbar between the tabs and the menu toolbar with some main functions, including "Wizard",  "1)Find duplicates", "2)Mark Files", "3)Process Marked Files".  Some kind of feedback also needed (without having to click the "Status Log" when each process completes.

    =============
    PS -
    Marking methodology (step 3 above)- Unfortunately "marking all duplicates except the earliest" can not be automated right now yet for 2 reasons, one is that an "Invert Marks" toggle option does not exist. It only exists as a right-click menu item in the "Duplicates" tab.  The other problem and more serious thing is that if two files share the same creation date then Duff selects neither one.  Even doing it manually then is a bit of a trick.  You have three possible ways to do this :
    - method A)  Fix the option called "if no difference between the file is found, mark at least 1 file".  This is an option under the "Options" tab, "Markers" section, and it doesn't work.  If we fix that (and put it on the Markers" tab), and add "invert marks" toggle to the "Markers tab", then you just toggle those 2 options along with "mark oldest file" in the "marker" tab to do step 3 here.  This would be the best method since it should be easily saved into a profile, but is not possible until that feature is fixed.
    - method B) Select :File Date" and toggle "most recently created (newest)" and click "Mark the Files", and then continue to step 4, then again repeat all 4 steps until there are no duplicates except for those which share the same creation date (those you have to go through the "Duplicates" tab manually mark the ones you want to process).  The  downside is that you might make a mistake doing your manual markings and accidentally delete all instances of the same one … AND it's a pain if you have a ton to go through.
    - method C)  Select "least recently created (oldest)", "Mark the Files", go to "Duplicates" tab, right-click and select "select all",  then right-click selection and select "invert marks" but ### DANGER ###: if a duplicate file exists where all it's instances have the same creation date then none will be selected, and then all instances will be selected when "invert marks" is invoked thereby its POSSIBLE TO DELETE ALL instances of the same file - EEK (when you delete the "extraDuplicates" directory contents)!  What you can do however is after complete the next step, run DUFF on the "extraDuplicates" directory (the files you'll ultimately want to delete) in order to move longest (or shortest) filename of the duplicates in there back into a new directory (name it "dupRecoveries")- this will select 1 of every extraDuplicate, then run DUFF a 3rd time using these 4 steps, this time on the original directories and include the new directory ("dupRecoveries").  Whatever is left in dupRecoveries after that 3rd DUFF run is a copy of those duplicates where all the instances shared the same creation date.  The one downside (besides being a 3 step process) is that these files will no longer be in their original directory.  IMHO this is the best way to do it manually until Method A is possible.

     
  • liona jackson

    liona jackson - 2017-01-22
    Post awaiting moderation.

Log in to post a comment.