Perhaps an interesting option would be, besides the "Zero-Tool", "Clear file slack".
What is file slack?
File system works in blocks. Most files don't have a filesize equal to an integer multiple of FS blocks. The last block of such files always contains useful data (the last bytes of a file that occupy less then a block) and the rest of the block containing potentially garbage, or fragments of previously erased sensitive data.
/----------/----------/ FS block boundary | | | v v v |DDDDDDDDDD|DDDxxxxxxx| | \- x = slack \---- D = data
While a new HDD is empty (all zeros) and a new OS install will result in (most) OS files having slack = zeroes, data files on disks with a history of create/modify/delete file will have previously written blocks overwritten by new files, and their slack now contains non-zero bytes.
This might increase compression efficiency for RAW image, large number of files.
This might be useful as an erase tool. (I remember one of the Norton Utilities had such an option)
This moght not be relevant for fsarchiver type backups or file backups.
Acomplishing this feat may also be quite complex, considering the fine details of a file system, also modification times, access times, permissions (e.g. immutable file on ext2,3,4), etc, which would have to be preserved. The purpose of this tool is to provide minimum to zero intrusion into the function of the system being worked on.
Would a scripting method be better then a dedicated C program?
Would this provide some size gain for the compression? (I believe so, but depends heavily on the previous usage pattern and history of the source disk)
Would it be time consuming to run this feature? Definitely! (so in the economics of time vs size won't be a gain)
Would it be worth the effort to develop it? Knowing the user base and feedback on G4L, the author definitely has more insight.
Thank you.
I've done some calculations.
My complete installation of Slackware has under /usr about 280 000 files.
Assuming a block size of 4k (4096) and assuming maximal case scenario : only 1 byte used in the last block, it sums up to about 1120000k or ~1GB.
Minimum case would be files on new disk, slack contains zeros, compress well.
I presume clean installations on new disks to yield minimum to negligible benefit.
Average case would be most files contain slack with data, let's average the above figure and get about 0.5G potential optimization.
I presume disks/partitions containing data that has changed a lot, large/huge number of files to yield benefits proportional to the number of files.
Few large files - no benefit.
Lots and lots of small files - moderate benefit
Lots and lots of small files on used disk - significant benefit proportional to number of files.