Re: [sleuthkit-users] hashing a file system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 09/04/2014 06:01 PM, Simson Garfinkel wrote:
> On Sep 4, 2014, at 8:51 PM, RB <ao...@gm...> wrote:
>> Although md5 is not a subdivisible hash (as Simson pointed out), one
>> could conceivably still do a single-pass checksum of a filesystem, the
>> tradeoff would be the memory consumption of tens of thousands of
>> "simultaneous" checksum calculation states.
>
> This doesn't work unless you are prepared to buffer the later fragments of a file when they appear on disk before earlier fragments. So in the worst case, you need to hold the entire disk in RAM.
I have been experimenting with exactly this approach.  And yes, you are 
right, in the worst case there is simply too much to buffer. In a 250GB 
ext4 image, one file was 8GB.  Its block allocation was such that I 
struggled to buffer 4GB of it, at which point my Java VM collapsed 
out-of-memory.

  I guess we could use a hybrid approach of all files under some size 
limit use the 'in block order' logic for hashing, with monster files 
defaulting to the 'regular, file-offset' logic.  Somehow I think this 
will largely defeat the whole point of the exercise and negate most of 
the time gains.

Another approach would be to externally store the 'pending data', which 
might be feasible if you had some Live CD with the tools plus some raw 
(i.e. no file system) usb or other data drive to use for scratch storage.

Stuart

Stuart