This pair of programs solves a problem I could not find a solution for. It is two very simple pieces of Java code
combined with the sqlite database. Feel free to use it in any way you choose. I place it in the public domain.
It works on both Linux and Windows. (use --win option for MakeScript)
Requirements: sqlite jdbc and Java jre
You have thousands of files (of any file type), many of which are duplicates. They reside in multiple directories
under a single base directory. You want to copy one of each unique file into a new set of directories, leaving the original files
and directories unchanged.
You run at the command line:
java com.bizdash.main.Scanner --showpaths --createtable /basedir
Scanner will scan through basedir and all directories under it, creating a MD5 hash for each file found. The hash
and the file's information are saved in an sqlite database for later processing.
You can optionally limit what file types are processed, specify directories to be skipped, and specify multiple base directories.
The scanner checks for and skips files that contain invalid file characters. The skipped files are noted in the log as "RENAME===="
followed by the current file name. If you want these files included in the database, rename them and run the scanner again.
Once you have scanned the files of interest, you are ready to run:
java com.bizdash.main.MakeScript --showfiles /basedir /newbasedir
MakeScript will go through the database and find one copy of each unique file. It will create a script file (sh or bash script)
that has all of the commands to copy the unique set of files to /newbasedir. /basedir needs to be the same as the /basedir
you specified for Scanner.
MakeScript also lets you filter by file type. The following command will copy only JPG files:
java com.bizdash.main.MakeScript --showfiles '.jpg' /basedir /newbasedir
Or do multiple file types with:
java com.bizdash.main.MakeScript --showfiles '.jpg .mpg' /basedir /newbasedir
Want a windows bat file? Use:
java com.bizdash.main.MakeScript --win --showfiles '.jpg .mpg' \basedir \newbasedir
You can change the destination by either running MakeScript again with new arguments or you can edit two variables at the top of the
script or bat file.
After running the copy script, you can have MakeScript check the copy results, using either the --check or --hash option.
--check checks to see if the file exists and reports it missing if not.
--hash (forces --check) and hashes the file and compares it to the original file's hash
Run either Scanner or MakeScript without any arguments to see their options.