Re: [sleuthkit-users] hashing a file system
Brought to you by:
carrier
From: RB <ao...@gm...> - 2014-09-05 00:51:31
|
On Thu, Sep 4, 2014 at 5:04 PM, Stuart Maclean <st...@ap...> wrote: > Is this a common activity, the hashing of a complete filesystem that > is? If yes, some experiments I have done with minimising total disk > seek time by ordering Runs, reading content from the ordered Runs and > piecing each file's hash back together would show that this is indeed a > worthy optimization since it can decrease the time spent deriving the > full hash table considerably. Yes, this is a question we've actually discussed on this list in recent memory. Fiwalk/DFXML is great for automation, but you can use "tsk_gettimes -m" to both pull listings and checksums at the same time for a quick win. The output is in traditional bodyfile form with the md5 field actually populated (instead of being "0"). I've incorporated it into a script that does other things (pulls other useful files from the disk), but all told it takes 8-40 minutes (averaging around 20) to burn through an average 120-300GB disk, usually CPU or IOPS-bound. This is, as you indirectly noted, heavily affected by fragmentation. Although md5 is not a subdivisible hash (as Simson pointed out), one could conceivably still do a single-pass checksum of a filesystem, the tradeoff would be the memory consumption of tens of thousands of "simultaneous" checksum calculation states. |