Thread: [sleuthkit-developers] Good vs. Bad Hashes
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2004-01-21 18:15:15
|
Is anyone interested in looking into the best way to manage hashes? The definition of "good" versus "bad" is relative to the current investigation and I don't know the best way to handle this in The Sleuth Kit and Autopsy. There could be a single database with categories of hashes and you choose which are good and which are bad for that investigation (similar to the new Forensic Hash Database that was announced and NSRL). Or, you could import tens of hash databases and identify them as bad or good (like hashkeeper). I think hashkeepr is LE-only, so I would rather focus on using NSRL and custom hashes made by md5sum. If anyone is interested in working on a workable solution to this, let me know. brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 18:33:08
|
Brian, I think we have technical and logical issues here. Logical we need to maintain a potential huge amount of data and categorize every single hash entry. Furthermore, we have to decide for each entry if it is a known-bad or a known-good. I think a useful solution is to maintain a global database with both freely available hashsums like NSRL,KnownGoods combined with selfmade hash set (md5sum/graverobber ...). The interface to autopsy and sleuthkit should allow to query only certain categories, only known bads, a certain category as known bad or not(-> e. g. remote management tools). The biggest problem here is to manage the category mapping table for all the different tools. The technical problem is to manage such a huge amount of raw data. With NSRL alone, we have millions of hash sets. This requires a new query mechanism. With a RDBMS, we need persistent connections and the possibility to bulk query large data sets very fast. With the current sorter|hfind design, sorter calls hfind one time per hash analyzed. This is definitely a big bottleneck. Best regards, Matthias --=20 Matthias Hofherr mail: mat...@mh... web: http://www.forinsect.de gpg: http://www.forinsect.de/pubkey.asc Brian Carrier said: > Is anyone interested in looking into the best way to manage hashes? The > definition of "good" versus "bad" is relative to the current > investigation and I don't know the best way to handle this in The > Sleuth Kit and Autopsy. There could be a single database with > categories of hashes and you choose which are good and which are bad > for that investigation (similar to the new Forensic Hash Database that > was announced and NSRL). Or, you could import tens of hash databases > and identify them as bad or good (like hashkeeper). > > I think hashkeepr is LE-only, so I would rather focus on using NSRL and > custom hashes made by md5sum. If anyone is interested in working on a > workable solution to this, let me know. |
From: Brian C. <ca...@sl...> - 2004-01-22 22:38:36
|
[I was hoping you would be interested in this topic in light of your new database :)] On Thursday, January 22, 2004, at 01:33 PM, Matthias Hofherr wrote: > Logical we need > to maintain a potential huge amount of data and categorize every single > hash entry. Furthermore, we have to decide for each entry if it is a > known-bad or a known-good. I think a useful solution is to maintain a > global database with both freely available hashsums like > NSRL,KnownGoods > combined > with selfmade hash set (md5sum/graverobber ...). I'm assuming that you are referring to a global database in the local sense. That each person has their own "global" database that they create and can add and remove hashes from. Not a global database in the Solaris Fingerprint DB sense. > The interface to > autopsy and sleuthkit should allow to query only certain categories, > only > known bads, a certain category as known bad or not(-> e. g. remote > management tools). The biggest problem here is to manage the category > mapping table for all the different tools. I agree. Especially when you start merging the home made hashes with those from the NSRL and hashkeeper. I guess we could have a generic category of 'Always Good' or 'Always Bad'. > The technical problem is to manage such a huge amount of raw data. With > NSRL alone, we have millions of hash sets. This requires a new query > mechanism. With a RDBMS, we need persistent connections and the > possibility > to bulk query large data sets very fast. With the current sorter|hfind > design, sorter calls hfind one time per hash analyzed. This is > definitely > a big bottleneck. Yea, I have no problem if the end solution requires a redesign of hfind and sorter. I'm just not sure what the end solution should be. Some open questions: - what application categories are needed? Are the NSRL ones sufficient or are there too many / too few of them? - How do you specify in the query which cat are bad and which are good? - How do you specify to 'sorter' which cat are bad and which are good? - Do we want to require a real database (i.e. SQL) or should there also be an ASCII file version? thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 23:26:20
|
Brian Carrier said: > [I was hoping you would be interested in this topic in light of your > new database :)] Yup, I am interested ;-) [...] > > I'm assuming that you are referring to a global database in the local > sense. That each person has their own "global" database that they > create and can add and remove hashes from. Not a global database in > the Solaris Fingerprint DB sense. Yes, I meant "local sense". Each user has other needs/requirements, so a global database for everybody should be out of question. >> The interface to >> autopsy and sleuthkit should allow to query only certain categories, >> only >> known bads, a certain category as known bad or not(-> e. g. remote >> management tools). The biggest problem here is to manage the category >> mapping table for all the different tools. > > I agree. Especially when you start merging the home made hashes with > those from the NSRL and hashkeeper. I guess we could have a generic > category of 'Always Good' or 'Always Bad'. > >> The technical problem is to manage such a huge amount of raw data. Wit= h >> NSRL alone, we have millions of hash sets. This requires a new query >> mechanism. With a RDBMS, we need persistent connections and the >> possibility >> to bulk query large data sets very fast. With the current sorter|hfind >> design, sorter calls hfind one time per hash analyzed. This is >> definitely >> a big bottleneck. > > Yea, I have no problem if the end solution requires a redesign of hfind > and sorter. > > I'm just not sure what the end solution should be. Some open questions= : > - what application categories are needed? Are the NSRL ones sufficient > or are there too many / too few of them? > - How do you specify in the query which cat are bad and which are good? > - How do you specify to 'sorter' which cat are bad and which are good? > - Do we want to require a real database (i.e. SQL) or should there also > be an ASCII file version? I think the NSRL segmentation in products/operation systems/manufacturers= is a good idea. Yet, the NSRL provided categories are partially duplicate an= d partially too much segmented. There is no simple solution for a query lik= e "check only against Linux system hashes". I think we should define a basic set of operation systems and other classification data and maintain a mapping table for imports of NSRL and other Hashsets. In my opinion a SQL database should be the base (easier structure, multi-index ...). On the other hand, there is no reason not to provide an export utility for ASCII exports in a defined format. This should handle both requirements. Matthias |
From: Brian C. <ca...@sl...> - 2004-01-23 05:39:32
|
I'll just keep this on the developers list. On Thursday, January 22, 2004, at 06:26 PM, Matthias Hofherr wrote: > Brian Carrier said: >> [I was hoping you would be interested in this topic in light of your >> new database :)] > > Yup, I am interested ;-) Excellent. > I think the NSRL segmentation in products/operation > systems/manufacturers is > a good idea. Yet, the NSRL provided categories are partially duplicate > and > partially too much segmented. There is no simple solution for a query > like > "check only against Linux system hashes". > I think we should define a basic set of operation systems and other > classification data and maintain a mapping table for imports of NSRL > and > other Hashsets. Can you lead the effort on making such a list then? I can't imagine having more than 15 categories. Otherwise it gets too messy and would be too difficult to look at in the configuration window. If we can make a comprehensive list of categories that scales for types of applications and/or types of platforms (although app type seems to be more important) then I would like to get it published in the IJDE (or similar) and see if we can make an argument for it to be a "standard" and adopted in the NSRL and others. thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-23 18:10:39
|
Brian Carrier said: > I'll just keep this on the developers list. > > On Thursday, January 22, 2004, at 06:26 PM, Matthias Hofherr wrote: [...] > > Can you lead the effort on making such a list then? I'll give it a try. > I can't imagine > having more than 15 categories. Otherwise it gets too messy and would > be too difficult to look at in the configuration window. I,too, think less categories are better. Yet we have two major kind of categories: - operation systems: always known-goods, not so many categories, many hash-sets - applications: many categories, which have to be compressed to, let's say 15 categories; may be both known-goods or known-bads I will give the matter some thoughts, talk with some people and compile a first proposal for the list. If anyone on this list is also interested in this matter, drop me a mail off list. Matthias |