Re: [sleuthkit-users] hashing a file system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Thu, Sep 4, 2014 at 7:01 PM, Simson Garfinkel <si...@ac...> wrote:
> This doesn't work unless you are prepared to buffer the later fragments of a file when they appear on disk before earlier fragments. So in the worst case, you need to hold the entire disk in RAM.

Perhaps I'm being dense, but "dd if=file | md5sum - " in no way holds
the entire file in RAM, and the process can be slept/interrupted/etc;
all this means that md5 can be calculated over a stream.

Looking at the API for Perl & Python MD5 libraries (expected to be the
simplest), they have standard functionality for adding data to a hash
object, and I don't expect it holds that in memory either.  This would
mean you should be able to make a linear scan through the disk and, as
you read blocks associated with a file, append them to the md5 object
for that file, and move on.  You'd have a lot of md5 objects
in-memory, but it shouldn't be of a size equivalent to the entire
[used] disk.