Re: [sleuthkit-developers] First Draft - Layout Hash Database

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Brian Carrier said:
[...]
> So, after thinking about this thread some more, there are two problems
> that are being addressed at the same time and I think they can be more
> independent and I think the merging has caused some confusion.
>
> 1.  A small set of application categories for any hash database.
>
> 2.  An implementation of a database that can import hashes from
> multiple sources.
>
> As I mentioned before, the categories are a problem with all databases
> and I think it would be useful if we could publish a list with
> requirements for each category.  From Doug's email, it sounds like NIST
> would be interested in such categories (assuming that they are
> comprehensive and make sense).

Ok, then let's treat the list of applications separately. We can later
decide if/how we want to implement this in our database. I'll compile a
list with examples out of our recent discussion and post it this weekend
for further discussion.

> For the implementation, it seems that we need to have a clear goal for
> the DB.  Is it for a comprehensive DB or is it just for quick good vs
> bad lookups.  Both are needed, but can we satisfy both goals with one
> DB?  Or, could that be an option at install time.  They can chose the
> quick / dirty / less data version or the full version.  I'm not a DB
> guy, so I have no clue what the answers for this are.

After thinking about the recent discussion and your comments, I would
prefer not to separate the database but instead the interface:

- we use a comprehensive database with a large set of information for eac=
h
hash set
- upon importing, everybody can decide for himself how much data to
include into the database
- we provide a mapping table in order to map the very detailed categories
to a small set of super-categories
- we provide 2 interfaces: "quick&dirty" (->super-categories) and
"long&detailed"

The biggest part of the database are the hashsets themself. The
organization of comprehensive add-on information doesn't use much
ressources, it requires only a good data model. So we gain not much by
using two different database models.

> It has occurred to me that there should be a 'source' column in the
> database, so that the entry can be attributed to the NSRL, hashkeeper,
> custom etc.  A version may also be useful.  This is also useful so that
> you can remove the hashes from the DB at a later point.

Good idea, I do use this already (without a version) in my forensic hash
database.

Regards,

Matthias