Thread: [sleuthkit-users] Good vs. Bad Hashes
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2004-01-21 18:15:15
|
Is anyone interested in looking into the best way to manage hashes? The definition of "good" versus "bad" is relative to the current investigation and I don't know the best way to handle this in The Sleuth Kit and Autopsy. There could be a single database with categories of hashes and you choose which are good and which are bad for that investigation (similar to the new Forensic Hash Database that was announced and NSRL). Or, you could import tens of hash databases and identify them as bad or good (like hashkeeper). I think hashkeepr is LE-only, so I would rather focus on using NSRL and custom hashes made by md5sum. If anyone is interested in working on a workable solution to this, let me know. brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 18:33:08
|
Brian, I think we have technical and logical issues here. Logical we need to maintain a potential huge amount of data and categorize every single hash entry. Furthermore, we have to decide for each entry if it is a known-bad or a known-good. I think a useful solution is to maintain a global database with both freely available hashsums like NSRL,KnownGoods combined with selfmade hash set (md5sum/graverobber ...). The interface to autopsy and sleuthkit should allow to query only certain categories, only known bads, a certain category as known bad or not(-> e. g. remote management tools). The biggest problem here is to manage the category mapping table for all the different tools. The technical problem is to manage such a huge amount of raw data. With NSRL alone, we have millions of hash sets. This requires a new query mechanism. With a RDBMS, we need persistent connections and the possibility to bulk query large data sets very fast. With the current sorter|hfind design, sorter calls hfind one time per hash analyzed. This is definitely a big bottleneck. Best regards, Matthias --=20 Matthias Hofherr mail: mat...@mh... web: http://www.forinsect.de gpg: http://www.forinsect.de/pubkey.asc Brian Carrier said: > Is anyone interested in looking into the best way to manage hashes? The > definition of "good" versus "bad" is relative to the current > investigation and I don't know the best way to handle this in The > Sleuth Kit and Autopsy. There could be a single database with > categories of hashes and you choose which are good and which are bad > for that investigation (similar to the new Forensic Hash Database that > was announced and NSRL). Or, you could import tens of hash databases > and identify them as bad or good (like hashkeeper). > > I think hashkeepr is LE-only, so I would rather focus on using NSRL and > custom hashes made by md5sum. If anyone is interested in working on a > workable solution to this, let me know. |
From: Brian C. <ca...@sl...> - 2004-01-22 22:38:36
|
[I was hoping you would be interested in this topic in light of your new database :)] On Thursday, January 22, 2004, at 01:33 PM, Matthias Hofherr wrote: > Logical we need > to maintain a potential huge amount of data and categorize every single > hash entry. Furthermore, we have to decide for each entry if it is a > known-bad or a known-good. I think a useful solution is to maintain a > global database with both freely available hashsums like > NSRL,KnownGoods > combined > with selfmade hash set (md5sum/graverobber ...). I'm assuming that you are referring to a global database in the local sense. That each person has their own "global" database that they create and can add and remove hashes from. Not a global database in the Solaris Fingerprint DB sense. > The interface to > autopsy and sleuthkit should allow to query only certain categories, > only > known bads, a certain category as known bad or not(-> e. g. remote > management tools). The biggest problem here is to manage the category > mapping table for all the different tools. I agree. Especially when you start merging the home made hashes with those from the NSRL and hashkeeper. I guess we could have a generic category of 'Always Good' or 'Always Bad'. > The technical problem is to manage such a huge amount of raw data. With > NSRL alone, we have millions of hash sets. This requires a new query > mechanism. With a RDBMS, we need persistent connections and the > possibility > to bulk query large data sets very fast. With the current sorter|hfind > design, sorter calls hfind one time per hash analyzed. This is > definitely > a big bottleneck. Yea, I have no problem if the end solution requires a redesign of hfind and sorter. I'm just not sure what the end solution should be. Some open questions: - what application categories are needed? Are the NSRL ones sufficient or are there too many / too few of them? - How do you specify in the query which cat are bad and which are good? - How do you specify to 'sorter' which cat are bad and which are good? - Do we want to require a real database (i.e. SQL) or should there also be an ASCII file version? thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 23:26:20
|
Brian Carrier said: > [I was hoping you would be interested in this topic in light of your > new database :)] Yup, I am interested ;-) [...] > > I'm assuming that you are referring to a global database in the local > sense. That each person has their own "global" database that they > create and can add and remove hashes from. Not a global database in > the Solaris Fingerprint DB sense. Yes, I meant "local sense". Each user has other needs/requirements, so a global database for everybody should be out of question. >> The interface to >> autopsy and sleuthkit should allow to query only certain categories, >> only >> known bads, a certain category as known bad or not(-> e. g. remote >> management tools). The biggest problem here is to manage the category >> mapping table for all the different tools. > > I agree. Especially when you start merging the home made hashes with > those from the NSRL and hashkeeper. I guess we could have a generic > category of 'Always Good' or 'Always Bad'. > >> The technical problem is to manage such a huge amount of raw data. Wit= h >> NSRL alone, we have millions of hash sets. This requires a new query >> mechanism. With a RDBMS, we need persistent connections and the >> possibility >> to bulk query large data sets very fast. With the current sorter|hfind >> design, sorter calls hfind one time per hash analyzed. This is >> definitely >> a big bottleneck. > > Yea, I have no problem if the end solution requires a redesign of hfind > and sorter. > > I'm just not sure what the end solution should be. Some open questions= : > - what application categories are needed? Are the NSRL ones sufficient > or are there too many / too few of them? > - How do you specify in the query which cat are bad and which are good? > - How do you specify to 'sorter' which cat are bad and which are good? > - Do we want to require a real database (i.e. SQL) or should there also > be an ASCII file version? I think the NSRL segmentation in products/operation systems/manufacturers= is a good idea. Yet, the NSRL provided categories are partially duplicate an= d partially too much segmented. There is no simple solution for a query lik= e "check only against Linux system hashes". I think we should define a basic set of operation systems and other classification data and maintain a mapping table for imports of NSRL and other Hashsets. In my opinion a SQL database should be the base (easier structure, multi-index ...). On the other hand, there is no reason not to provide an export utility for ASCII exports in a defined format. This should handle both requirements. Matthias |