Re: [sleuthkit-users] Good vs. Bad Hashes
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2004-01-22 22:38:36
|
[I was hoping you would be interested in this topic in light of your new database :)] On Thursday, January 22, 2004, at 01:33 PM, Matthias Hofherr wrote: > Logical we need > to maintain a potential huge amount of data and categorize every single > hash entry. Furthermore, we have to decide for each entry if it is a > known-bad or a known-good. I think a useful solution is to maintain a > global database with both freely available hashsums like > NSRL,KnownGoods > combined > with selfmade hash set (md5sum/graverobber ...). I'm assuming that you are referring to a global database in the local sense. That each person has their own "global" database that they create and can add and remove hashes from. Not a global database in the Solaris Fingerprint DB sense. > The interface to > autopsy and sleuthkit should allow to query only certain categories, > only > known bads, a certain category as known bad or not(-> e. g. remote > management tools). The biggest problem here is to manage the category > mapping table for all the different tools. I agree. Especially when you start merging the home made hashes with those from the NSRL and hashkeeper. I guess we could have a generic category of 'Always Good' or 'Always Bad'. > The technical problem is to manage such a huge amount of raw data. With > NSRL alone, we have millions of hash sets. This requires a new query > mechanism. With a RDBMS, we need persistent connections and the > possibility > to bulk query large data sets very fast. With the current sorter|hfind > design, sorter calls hfind one time per hash analyzed. This is > definitely > a big bottleneck. Yea, I have no problem if the end solution requires a redesign of hfind and sorter. I'm just not sure what the end solution should be. Some open questions: - what application categories are needed? Are the NSRL ones sufficient or are there too many / too few of them? - How do you specify in the query which cat are bad and which are good? - How do you specify to 'sorter' which cat are bad and which are good? - Do we want to require a real database (i.e. SQL) or should there also be an ASCII file version? thanks, brian |