Re: [sleuthkit-users] hashing a file system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Sep 4, 2014 at 5:04 PM, Stuart Maclean
<st...@ap...> wrote:
> Is this a common activity, the hashing of a complete filesystem that
> is?  If yes, some experiments I have done with minimising total disk
> seek time by ordering Runs, reading content from the ordered Runs and
> piecing each file's hash back together would show that this is indeed a
> worthy optimization since it can decrease the time spent deriving the
> full hash table considerably.

Yes, this is a question we've actually discussed on this list in
recent memory.  Fiwalk/DFXML is great for automation, but you can use
"tsk_gettimes -m" to both pull listings and checksums at the same time
for a quick win.  The output is in traditional bodyfile form with the
md5 field actually populated (instead of being "0").

I've incorporated it into a script that does other things (pulls other
useful files from the disk), but all told it takes 8-40 minutes
(averaging around 20) to burn through an average 120-300GB disk,
usually CPU or IOPS-bound.  This is, as you indirectly noted, heavily
affected by fragmentation.

Although md5 is not a subdivisible hash (as Simson pointed out), one
could conceivably still do a single-pass checksum of a filesystem, the
tradeoff would be the memory consumption of tens of thousands of
"simultaneous" checksum calculation states.