Thread: [sleuthkit-users] hashing a file system

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I am tracking recent efforts in STIX and Cybox and all things Mitre.  
One indicator of compromise is an md5 hash of some file. Presumably you 
compare the hash with all files on some file system to see if there is a 
match.  Obviously this requires a walk of the host fs, using e.g. fls or 
fiwalk or the tsk library in general.

Is this a common activity, the hashing of a complete filesystem that 
is?  If yes, some experiments I have done with minimising total disk 
seek time by ordering Runs, reading content from the ordered Runs and 
piecing each file's hash back together would show that this is indeed a 
worthy optimization since it can decrease the time spent deriving the 
full hash table considerably.

I did see a slide deck by Simson G where he alluded to a similar win 
situation when disk reads are ordered so as to minimise seek time, but 
wonder if much has been published on the topic, specifically relating to 
the digital forensics arena, i.e. when an entire file system contents is 
to be read in a single pass, for the purposes of producing an 'md5 -> 
file path' map.

Opinions and comments welcomed.

Stuart