Re: [sleuthkit-users] Good vs. Bad Hashes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

> However, I am beginning to wonder how effective hash sets of 
> "known-bad"
> are going to be moving into the future--I think they have shown some
> benefit to LEA and others investigating child porn, malware, etc.  but
> as the perps get wise to this technique, you'll probably start seeing
> more things like polymorphic archives, encrypted executables, and other
> files types that may change based on context or just randomly when
> accessed.  Manually modifying files with a hex editor would be a simple
> way to change the sums of any file--which is much more of a current
> reality.    We've seen this somewhat in the anti-virus industry which
> makes me wonder how some sort of heuristics system may be more 
> effective
> for this area.

That is a good point.  And one trojan source file can generate many 
different execs with different hashes based on what compiler flags were 
used.

That still leaves the problem of organizing what is "good" though.  is 
pcAnywhere a good or bad hash?  Depends on the investigation.

> The other big issue is categorizing the large number of hashes, I think
> the reference data set of NSRL is 17.9 million hashes.  Manually
> categorizing them would not be possible--would have to look closer at
> the NSRL "schema" to see if an automated process could be developed 
> once
> categories were determined.

There are Application types in the schema, but I'm not sure how they 
were chosen or how many there are.  You can see a list here:

http://www.nsrl.nist.gov/index/apptype.index.txt

The reason that I am asking this is because it is an important issue, 
but I already have too many things on my plate.  So, if people are 
interested in finding a solution to this, then please do.  I won't get 
to it for several months.

thanks,
brian