Re: [sleuthkit-developers] First Draft - Layout Hash Database
Brought to you by:
carrier
From: Matthias H. <mat...@mh...> - 2004-01-28 18:12:03
|
Michael Cohen said: [...] > I find this is an important requirement, particularly for sql databases= . > The > os and applications should be short ints so that an index may be built = on > them making it faster to search. Also I found that building a partial > index > on the md5 column itself speeds things up several orders of maginitude, > but > still keeps the index size reasonable so it fits well in ram. Performance will not be one of our bigger problems. Even with, say 20 million entries (NSRL alone has nearly 18 mio.), we should get reasonable search times, provided we use some clever indexing. Sure, one problem will be to import 20 mio. entries. But with index dropp= ing and setting it after the import we will gain much time. The performance question is not important as long as we do not have a goo= d data model. To add performance features is simple textbook work. >> > Application entry: > Are you suggesting to not name the application product at all? but rath= er > only > contain information on the category of the application? So for example = in > the > table "msword.exe" will have office tools as application, but not refer= to > microsoft word as a product? I really think that you still need to > classify > the hash set with the commercial name of the application, otherwise you > would > not know which specific application xyz.dll belongs to. I think we have to decide if we want kind of a full management database with all possible kind of information for a hash set or if we need a database with a relatively small number of categories for excluding knowngoods and alerting on knownbads. For the later, we do not need to know if "msword.exe" is from the Package "Microsoft Office 2000 SP 3 Hotfix 2a". For the former, we need the detailed information. Which brings us to an other problem: Do we allow duplicate entries for hashsums in the database ? The former solution will allow this, the later probably doesn't require it. > In general I think the approach taken by NSRL is not a bad one. [...] > This is much more effective than > having to redo the entire nsrl. The problem is, that it is absolutely no problem the make a database structure for NSRL. In fact, NSRL already has a full generic database structure which could be easily adapted. But this was, so far, not my intention (see above) Yet, we do not have to redo the NSRL database. We only have to define a mapping (once) for NSRL categories. Automatic import with a parser scri= pt is not problem. Since NSRL categories do not change too much, maintainanc= e should be no problem. >> MacOS probably shouldn't get a separate category from OSX unless Win >> '98 is also separated from Win XP. The specific types in BSD should b= e >> defined (since OS X is actually a variant of BSD). The Solaris >> category should also include SunOS. > I think that OSs should be granulated down as much as practically > possible. So > I would give win98 a different category than winXP. Maybe not so much a= s > to > seperate the different service packs, but its often very evident what > kind > of os you are working on, and it would speed things out considerably if > the > database could be split into different tables, depending on the OS. Thi= s > effect can be achieved by building an index on the OS column, this > severely > lightens the load on the query if we restrict our searches to particula= r > os's. Same problem as above: either we use small categories with a usable interface or we define huge categories with a VERY large interface. Agreed, the later will result in a faster performance due to more detaile= d constraints in the query. But with good indexing and persistent database connections, speed should be reasonable with small categories as well. Regards, Matthias |