Re: [sleuthkit-users] hashing a file system
Brought to you by:
carrier
From: Stuart M. <st...@ap...> - 2014-09-05 19:04:08
|
On 09/04/2014 06:01 PM, Simson Garfinkel wrote: > On Sep 4, 2014, at 8:51 PM, RB <ao...@gm...> wrote: >> Although md5 is not a subdivisible hash (as Simson pointed out), one >> could conceivably still do a single-pass checksum of a filesystem, the >> tradeoff would be the memory consumption of tens of thousands of >> "simultaneous" checksum calculation states. > > This doesn't work unless you are prepared to buffer the later fragments of a file when they appear on disk before earlier fragments. So in the worst case, you need to hold the entire disk in RAM. I have been experimenting with exactly this approach. And yes, you are right, in the worst case there is simply too much to buffer. In a 250GB ext4 image, one file was 8GB. Its block allocation was such that I struggled to buffer 4GB of it, at which point my Java VM collapsed out-of-memory. I guess we could use a hybrid approach of all files under some size limit use the 'in block order' logic for hashing, with monster files defaulting to the 'regular, file-offset' logic. Somehow I think this will largely defeat the whole point of the exercise and negate most of the time gains. Another approach would be to externally store the 'pending data', which might be feasible if you had some Live CD with the tools plus some raw (i.e. no file system) usb or other data drive to use for scratch storage. Stuart Stuart |