Re: [sleuthkit-users] hashing a file system

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Simson,

I have had thoughts about implementing this "sort by sector number of first
run" approach in a forensic tool based on TskJavaBindings, but I did not
see how to get the file first sector number through the API. Do you know if
it is possible with tsk java bindings?

Regards,
Luis Nassif

2014-09-04 20:13 GMT-03:00 Simson Garfinkel <si...@ac...>:

> Hi Stuart.
>
> You are correct — I put this in numerous presentations but never published
> it.
>
> The MD5 algorithm won't let you combine a partial hash from the middle of
> the file with one from the beginning. You need to start at the beginning
> and hash through the end. (That's one of the many problems with MD5 for
> forensics, BTW.) So I believe that the only approach is sorting the files
> by the sector number of the first run, and just leaving it at that.
>
> I saw speedup with both HDs and SSDs, strangely enough, but not as much
> with SSDs. There may be a prefetch thing going on here.
>
> I think that the Autopsy framework should hash this way, but currently it
> doesn't. On the other hand, it may be more useful to hash based on the
> "importance" of the files.
>
> Simson
>
>
>
>
> On Sep 4, 2014, at 7:04 PM, Stuart Maclean <st...@ap...>
> wrote:
>
> > I am tracking recent efforts in STIX and Cybox and all things Mitre.
> > One indicator of compromise is an md5 hash of some file. Presumably you
> > compare the hash with all files on some file system to see if there is a
> > match.  Obviously this requires a walk of the host fs, using e.g. fls or
> > fiwalk or the tsk library in general.
> >
> > Is this a common activity, the hashing of a complete filesystem that
> > is?  If yes, some experiments I have done with minimising total disk
> > seek time by ordering Runs, reading content from the ordered Runs and
> > piecing each file's hash back together would show that this is indeed a
> > worthy optimization since it can decrease the time spent deriving the
> > full hash table considerably.
> >
> > I did see a slide deck by Simson G where he alluded to a similar win
> > situation when disk reads are ordered so as to minimise seek time, but
> > wonder if much has been published on the topic, specifically relating to
> > the digital forensics arena, i.e. when an entire file system contents is
> > to be read in a single pass, for the purposes of producing an 'md5 ->
> > file path' map.
> >
> > Opinions and comments welcomed.
> >
> > Stuart
> >
> >
> >
> ------------------------------------------------------------------------------
> > Slashdot TV.
> > Video for Nerds.  Stuff that matters.
> > http://tv.slashdot.org/
> > _______________________________________________
> > sleuthkit-users mailing list
> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> > http://www.sleuthkit.org
>
>
>
> ------------------------------------------------------------------------------
> Slashdot TV.
> Video for Nerds.  Stuff that matters.
> http://tv.slashdot.org/
> _______________________________________________
> sleuthkit-users mailing list
> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> http://www.sleuthkit.org
>