Re: [sleuthkit-users] Good vs. Bad Hashes
Brought to you by:
carrier
From: Matthias H. <mat...@mh...> - 2004-01-22 23:26:20
|
Brian Carrier said: > [I was hoping you would be interested in this topic in light of your > new database :)] Yup, I am interested ;-) [...] > > I'm assuming that you are referring to a global database in the local > sense. That each person has their own "global" database that they > create and can add and remove hashes from. Not a global database in > the Solaris Fingerprint DB sense. Yes, I meant "local sense". Each user has other needs/requirements, so a global database for everybody should be out of question. >> The interface to >> autopsy and sleuthkit should allow to query only certain categories, >> only >> known bads, a certain category as known bad or not(-> e. g. remote >> management tools). The biggest problem here is to manage the category >> mapping table for all the different tools. > > I agree. Especially when you start merging the home made hashes with > those from the NSRL and hashkeeper. I guess we could have a generic > category of 'Always Good' or 'Always Bad'. > >> The technical problem is to manage such a huge amount of raw data. Wit= h >> NSRL alone, we have millions of hash sets. This requires a new query >> mechanism. With a RDBMS, we need persistent connections and the >> possibility >> to bulk query large data sets very fast. With the current sorter|hfind >> design, sorter calls hfind one time per hash analyzed. This is >> definitely >> a big bottleneck. > > Yea, I have no problem if the end solution requires a redesign of hfind > and sorter. > > I'm just not sure what the end solution should be. Some open questions= : > - what application categories are needed? Are the NSRL ones sufficient > or are there too many / too few of them? > - How do you specify in the query which cat are bad and which are good? > - How do you specify to 'sorter' which cat are bad and which are good? > - Do we want to require a real database (i.e. SQL) or should there also > be an ASCII file version? I think the NSRL segmentation in products/operation systems/manufacturers= is a good idea. Yet, the NSRL provided categories are partially duplicate an= d partially too much segmented. There is no simple solution for a query lik= e "check only against Linux system hashes". I think we should define a basic set of operation systems and other classification data and maintain a mapping table for imports of NSRL and other Hashsets. In my opinion a SQL database should be the base (easier structure, multi-index ...). On the other hand, there is no reason not to provide an export utility for ASCII exports in a defined format. This should handle both requirements. Matthias |