Re: [sleuthkit-users] Good vs. Bad Hashes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Brian Carrier said:
> [I was hoping you would be interested in this topic in light of your
> new database :)]

Yup, I am interested ;-)

[...]
>
> I'm assuming that you are referring to a global database in the local
> sense.  That each person has their own "global" database that they
> create and can add and remove hashes from.  Not a global database in
> the Solaris Fingerprint DB sense.

Yes, I meant "local sense". Each user has other needs/requirements, so
a global database for everybody should be out of question.

>> The interface to
>> autopsy and sleuthkit should allow to query only certain categories,
>> only
>> known bads, a certain category as known bad or not(-> e. g. remote
>> management tools). The biggest problem here is to manage the category
>> mapping table for all the different tools.
>
> I agree.  Especially when you start merging the home made hashes with
> those from the NSRL and hashkeeper.   I guess we could have a generic
> category of 'Always Good' or 'Always Bad'.
>
>> The technical problem is to manage such a huge amount of raw data. Wit=
h
>> NSRL alone, we have millions of hash sets. This requires a new query
>> mechanism. With a RDBMS, we need persistent connections and the
>> possibility
>> to bulk query large data sets very fast. With the current sorter|hfind
>> design, sorter calls hfind one time per hash analyzed. This is
>> definitely
>> a big bottleneck.
>
> Yea, I have no problem if the end solution requires a redesign of hfind
> and sorter.
>
> I'm just not sure what the end solution should be.  Some open questions=
:
> - what application categories are needed?  Are the NSRL ones sufficient
> or are there too many / too few of them?
> - How do you specify in the query which cat are bad and which are good?
> - How do you specify to 'sorter' which cat are bad and which are good?
> - Do we want to require a real database (i.e. SQL) or should there also
> be an ASCII file version?

I think the NSRL segmentation in products/operation systems/manufacturers=
 is
a good idea. Yet, the NSRL provided categories are partially duplicate an=
d
partially too much segmented. There is no simple solution for a query lik=
e
"check only against Linux system hashes".
I think we should define a basic set of operation systems and other
classification data and maintain a mapping table for imports of NSRL and
other Hashsets.

In my opinion a SQL database should be the base (easier structure,
multi-index ...). On the other hand, there is no reason not to provide an
export utility for ASCII exports in a defined format. This should handle
both requirements.

Matthias