sleuthkit-developers Mailing List for The Sleuth Kit (Page 41)
Brought to you by:
carrier
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(39) |
Mar
(8) |
Apr
(17) |
May
(10) |
Jun
(2) |
Jul
(6) |
Aug
(4) |
Sep
(1) |
Oct
(3) |
Nov
|
Dec
|
2005 |
Jan
(2) |
Feb
(6) |
Mar
(2) |
Apr
(2) |
May
(13) |
Jun
(2) |
Jul
|
Aug
|
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
(9) |
Dec
(4) |
2007 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
(2) |
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(9) |
Jul
(14) |
Aug
|
Sep
(5) |
Oct
(10) |
Nov
(4) |
Dec
(7) |
2009 |
Jan
(7) |
Feb
(10) |
Mar
(10) |
Apr
(19) |
May
(16) |
Jun
(3) |
Jul
(9) |
Aug
(5) |
Sep
(5) |
Oct
(16) |
Nov
(35) |
Dec
(30) |
2010 |
Jan
(4) |
Feb
(24) |
Mar
(25) |
Apr
(31) |
May
(11) |
Jun
(9) |
Jul
(11) |
Aug
(31) |
Sep
(11) |
Oct
(10) |
Nov
(15) |
Dec
(3) |
2011 |
Jan
(8) |
Feb
(17) |
Mar
(14) |
Apr
(2) |
May
(4) |
Jun
(4) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(8) |
Nov
(16) |
Dec
(1) |
2012 |
Jan
(9) |
Feb
(2) |
Mar
(3) |
Apr
(13) |
May
(10) |
Jun
(7) |
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(19) |
Dec
(3) |
2013 |
Jan
(16) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
|
Jun
(3) |
Jul
(2) |
Aug
(17) |
Sep
(6) |
Oct
(1) |
Nov
|
Dec
(4) |
2014 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(7) |
May
(6) |
Jun
(1) |
Jul
(18) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
(26) |
Dec
(7) |
2015 |
Jan
(5) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(7) |
Sep
(4) |
Oct
(1) |
Nov
(1) |
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
(13) |
Jul
(23) |
Aug
(2) |
Sep
(11) |
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
(4) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
(3) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(5) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Michael C. <mic...@ne...> - 2004-02-03 22:10:53
|
On Wed, 4 Feb 2004 03:34 am, Brian Carrier wrote: > Wow! This looks great! Thanks, Brian. > My original plan was to use the '-o' flag to specify the sector offset > for the file system. I figured sectors would be easier than bytes > because mmls and fdisk give you the values in sectors and almost every > disk uses a 512-byte sector. This also allows people to use the offset > value without the '-i' setting. Great idea. Sectors would be much more useful than straight bytes. The idea is that each subsystem may choose to implement its logical-physical mapping however makes sense for it. And therefore would need different parameters most conveniently denoted by name. So rather than waste a whole option -o on just an offset, maybe we could use -o to specify a number of subsystem dependant options. If we want people to be able to use -o without needing to worry about using -i, thats easily solved. If you dont use -i, the default sk subsystem is used, and it can simply take a single option, being offset. so users can just use -o to implement a simple offset. > I like the idea of the '-i' because it is like specifying the image > type, whereas -f is specifying the file system type. I hadn't thought > about getting this advanced, but it looks good. > > I would actually say that '-i' should only have the type and no other > options. If multiple files are needed (splitting and RAID), then they > should be appended to the end of the command. For example, to look at > the file system at offset sector 12345, the following could be used > (names are made up): > > Normal full image: > fls -f linux-ext3 -o 12345 file1.dd > or > fls -f linux-ext3 -i single -o 12345 file1.dd > > Split Image: > fls -f linux-ext3 -i split -o 12345 file1.dd file2.dd > > LVM RAID Image: > fls -f linux-ext3 -i lvm -o 12345 lvm-config.dat > > MS LDM Spanning Image > fls -f ntfs -i ldm-span -o 12345 ldm-config.dat That is indeed a good suggestion. It needs more careful manipulation of the getopts in the client program but it should work. The only trouble is that the parameters to the subsystem can be arbitrary- subsystem specific ones, so for example maybe for split image: fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file2.dd and just in case you wanted to have a file called offset or blocksize, you could use a qualifier called file= in front of it like: fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file=offset but without a qualifier, its just interpreted as a filename. Similarly for the truely lazy user if the subsystem specific option parser sees an option consisting just a number, it takes that as the offset, then you dont need to qulify offset by using a keywork. > It would also be useful if the config file format that you are > developing for the RAID images could be used for the split images. It can, but the algorithm for the raid reconstruction is more complex, and performance would suffer if the same subsystem was used all around. The format (not finalised yet...) is something like: paremeter=... parameter=... slot number,disk number slot number,disk number one per line. A slot is the logical position within the raid period where the block should be taken from. example: 1,1 2,1 1,2 2,2 specifies that the first block is taken from slot 1, disk1, the next from slot 2, disk 1, the next from slot 1, disk 2 and slot 2,disk 2. so if we starts the raid period at block 0, slot 1 corresponds to block 0, and slot 2 to block 1. The next blocks requested starts a whole new period which the slots into a new set of absolute offsets, namely slot 1 is now block 2 and slot 2 is block 3... etc etc... So this scheme does use offsets to start reading the disks, and block sizes so i guess if you really wanted, you could make a raid map correspond to a number of split disks, but not easily, especially if the disks have different sizes. I guess the file may not be that human readable, because we use flag to generate it automatically. I really didnt want to have to use more advanced lex/yacc for this. What do you think? > To keep the subsystem design similar to what currently exists, have you > thought about the following: > > A new data structure IO_INFO and before fs_open is run, the io_open() > function is run with either the image lists or the config file etc and > the offset. There would probably have to be one for io_open_files(char > **) and io_open_config(char *). > > The IO_INFO structure is filed in with io_open and the needed read > functions are mapped (like file_walk etc are now in FS_INFO). > > The fs_open() function gets the IO_INFO structure passed to it and the > fs_open() no longer needs to do the open() system call on the images. > It just checks the magic value and fills in FS_INFO. Any > read_random() function in the file system code turns into > fs_info->io->read_random(...). This is an alternative design - the advantage with your method is that you could potentially have a number of different subsystems in use at the same time in the same program, while my subsystem design keeps subsystem data as static so its program wide. I just didnt really want to change all the read_random functions throughout the code (it would mean bigger changes in the architecture because almost every file will be touched many times.). I still think that it would be more useful to allow each subsystem to manage its own options, rather than trying to second guess all the options in advance and stick them into the io_info struct. So for example rather than have the io_info struct have one entry for io_open_files(char **) and io_open_config(char *), maybe we can just have an entry for void *data, and a single io_open(void *data), and allow the subsystem to set that to whatever configuration parameters make sense for it - the single file option might attach a char * in the data pointer, while the multifile stuff might attach a char **. The raid subsystem might attach a preparse linked list of its raid map so it can work off that. whatever makes sense. A couple of more types of IO subsystem i just thought of are an encase file format subsystem (allows you to read standard encase files with sk) and a compressed file subsystem (allows to work directly off compressed files). I have no idea how difficult it would be to actually implement those, but they look promising. Michael. |
From: Brian C. <ca...@sl...> - 2004-02-03 17:11:35
|
Thanks Matthias. Some of these still have a lot of overlap. For example, acroread is very similar to any of the office tools since you can read and now apply edits to the document. In general, the desktop category seems to be a different form of "other". Similarly, the web category has a lot of overlap with server daemons. Here is a quick guess at how i would organize some of this. It is not done and maybe completely off. I realized as I was doing it that it could be useful to distinguish between tools in the category and files in the category: System Tools and Files - Files that are required for the kernel and operating system to run - the kernel executable and other required executables - Files that are used to administer the kernel and operating system - config files, registry ... - General files that are used by applications - system libraries, dll - Files that are needed to develop tools for the system - system header files - Drivers? Communication Tools - User-level tools that send or recieve files from other network hosts - email client, web browser, ftp client, peer-to-peer client - email server, HTTP server, FTP server ... - user-level tools that allow interactive communication between two people - im, irc Communication Files - Files used to communicate data or other files: - email with headers, HTML pages WITH HTTP data, any 'encoded' file - im logs Document Tools - Tools that create or view human-readable documents that are used to store and organize data - office, openoffice, excel, acrobat reader - text editor - HTML development tools - cgi, php ... Document Files - Files that can be interpreted to show human-readable data - Word Documents, HTML files, pdf files, xls files - text files Multimedia Tools - Tools that play or record audio - iTunes, ... - Tools that play or record video - Real, quicktime, Windows Media .. - Tools that play or record still photographs and graphics - Photoshop, Illustrator Multimedia Files - Audio Files - mp3, wav - Video Files - avi - Graphic Files - jpg, gif.. ---------------------------------------------------- I'm less confident about these: Personal Organization Tools and Files - Files used to organize a user's time and tasks - calendar address book, todo list? - PDA sync tools Database - Tools and files that are used to store and retrieve data from a database - oracle, access, SQL - files for the above databases Security - Prevention - Tools and files that are used to secure a system from attack - anti-virus - personal firewalls - IDS Security - Attack - Tools and files that are used to cause a security incident - exploits - attack tools - DDoS tools - viruses - Tools and files that are used to remove evidence of incident - log cleaner - evidence eliminator - Tools and files that are used to allow access to a compromised system - rootkits Games - Tools and files that are games - solitare .... ---------------------- I Haven't thought enough about these tools yet: network port scanners network IP scanners (ping) network sniffers remote management hex editor calculator winzip, tar.gz encryption tools development tools On Feb 1, 2004, at 6:27 AM, Matthias Hofherr wrote: > Hi all, > > here a short writeup from our last discussion about application > categories. > (kg) means, default for this category is known-good and (kb) is > known-bad. > > > Application entry: > > - remote management (kg) > Examples: vnc,PC Anywhere, BO/BO2K, SubSeven (???) ... > > - office tools (kg) > Examples: the different office suits - MS Office, OpenOffice, > StarOffice, > Adobe Acrobat ... > > - database (kg) > Examples: the database server and clients, database content files > > - desktop (kg) > Examples: desktop programs like kde tools, acrobat reader, winzip, > games, screensavers,web browser, email clients > > - security (kg) > Examples: nmap, hping2, virus scanners and signatures, content filter > software, tripwire/aide/samhain, IDS tools > > - sysutils (kg) > Examples: every day sysadmin utilities (*nix: /sbin/*,/usr/bin ...) > > - server daemons (kg) > Examples: sendmail,postfix,pop3d,imapd,apache,... > > - web/network (?) (kg) > Examples: cgi scripts, php files, ... > > - multimedia (kg) > Examples: sound-, picture-, video-files > > - drivers (kg) > Examples: driver software (sic!) > > - (child-)porn (kb) > Examples: the name says it all > > - malware (kb) > Examples: rootkits, malicious code, worms, viruses, trojans, > backdoors ... > > - other (kg) > Examples: everything which doesn't fit in the other categories > > Is "malware" an appropiate name ? Shall we further divide this > category ? > > How about the separate "child-porn" section ? There are other kinds of > illegal porn which do not fit in this category. > > Is "web" or "network" a better name ? What more content would include > network which doesn't fit in the other categories ? > > It seems that our "remote management" categoriy includes potentially > more > known-bad (subseven/BO(2K) ...) than known-good tools. Should we > disband > this category and absorb it in the other categories ? > > Has anyone good ideas for other groups or better group names ? > > Regards, > > Matthias > > > -- > Matthias Hofherr > mail: mat...@mh... > web: http://www.forinsect.de > gpg: http://www.forinsect.de/pubkey.asc > > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers > > |
From: Michael C. <mic...@ne...> - 2004-02-03 13:48:13
|
Dear List, Please accept this patch to the sleuthkit to implement a pluggable IO subsystem for the fstools. (patch against 1.67, fstools directory). Background Quite often users are supplied with dd images that do not immediately work with sleuthkit. Two notable examples are: - when a dd image was taken of the hdd - in this case users have to use sfdisk to work out the partition offsets and then use dd with appropriate skip parameters to extract each partition, before being able to use the sleuthkit. This is because the sk expects to have a dd image of a partition (i.e. filesystem starts at offset 0 in the image file. This is not always the case). - Sometimes images are split into smaller sizes for example in order to burn to cd/dvd etc. This means that images need to be stuck together before analysis potentially wasting time and space. It would be nice if one could use the images directly - without needing to do creative dd manipulations. Solution This patch implements a modular io subsystem approach - all filesystem operations within the sk are made to use this subsystem, and the user can choose the subsystem they want. The subsystem is responsible to seeking into the file and extracting data out of the dd image - how that is implemented is completely abstracted from the point of view of the fstools. The user can choose the subsystem to be used by the -i (io subsystem) command line switch. Then a list of arguments can be passed to the subsystem to initialise it correctly. Once that is done, the regular sk calls can be made (e.g. fs_open etc). The io subsystem will take care of the specifics of implementation. This patch includes 2 subsystem modules: simple and advanced. The simple module is exactly the same as the old sk, while the advanced module allows for specifying offsets into the dd file, as well as multiple dd files in sequence. Example: As an example the fls and icat tools were modified to support the new sub system, more tools will be converted tomorrow once i get some sleep. Example of how to seek into a partition within a disk dd: fls -i advanced -o offset=524288 -f linux-ext2 test.dd This selects the advanced io subsystem and passes it the offset option specifying 1024 blocks of 512 bytes. Now we can split the dd image across multiple files (maybe using the split utility), and still analyse them at once: fls -i advanced -o offset=524288,file=xaa,file=xab,file=xac,file=xad -f linux-ext2 xae Note that xae (the last part of the image will be appened to the list of parts automatically). Also note that all the options in -o are passed as one parameter to the subsystem which then parses them into the relevant arguements. If the subsystems name is not found, the subsystem will list all known subsystems: bash# fls -i help -f linux-ext2 test.dd Available Subsystems: standard - Standard Sleuthkit IO Subsystem advanced - Advanced Sleuthkit IO Subsystem fls: Could not set io subsystem help To get more help about the options available, try setting an option which is not supported: bash# fls -i advanced -o help -f linux-ext2 test.dd option help not recognised Advanced io subsystem options offset=bytes Number of bytes to seek to in the image file. Useful if there is some extra data at the start of the dd image (e.g. partition table/other partitions file=filename Filename to use for split files. If your dd image is split across many files, specify this parameter in the order required as many times as needed for seemless integration Future work: I am in the process of implementing a raid reassembly functionality. I.e. given a raid reconstruction map (a file telling sk the order in which raid blocks go together) and a list of dd images of individual drives, the io subsystem will transparently reassemble the logical data. I have a working prototype so i know its possible. The abstracted io subsystem concept will be very handy for that. |
From: Márcio C. <ma...@di...> - 2004-02-01 13:37:06
|
On Sat, 31 Jan 2004 12:48:34 -0500, Brian Carrier <ca...@sl...> escreveu: > > For the record, I've discussed this with Márcio offline a little while > back, so please don't think I'm blowing off his request. I fotgot to mention that! Sorry... > I too think that a non-HTML interface would be useful, but that is a > lot of work. Yes, it is. In fact, I think it should be a new project, related to Autopsy and Sleuth. > I was recently reminded though that another need in the area is a GUI > for disk acquisition. A window that allows you to choose the source > and the destination and it figures out the needed flags for 'dd' and > runs it. I'm sure the people on this list have figured out 'dd' fairly > well, but it is a bit awkward for people who are moving from > Windows-based acquisitions to Linux-based ones. I agree. I hadn't think about this, as it is a simple task (but for *nix people... :-). > The bootable CDs would be that much better if it came up with a window > that allowed people to do acquisitions. In light of NTI not making the > new format of Safeback public, i would think that LE is going to be > looking for a new solution and the command line aspects of 'dd' are > likely a deterrence. > > I just thought I would throw it out there in case there were GUI people > who were interested. It could even be HTML-based for portability. OK, that could be a new project! Little one, but handy. Márcio. |
From: Matthias H. <mat...@mh...> - 2004-02-01 11:46:53
|
Hi all, here a short writeup from our last discussion about application categorie= s. (kg) means, default for this category is known-good and (kb) is known-bad= . Application entry: - remote management (kg) Examples: vnc,PC Anywhere, BO/BO2K, SubSeven (???) ... - office tools (kg) Examples: the different office suits - MS Office, OpenOffice, StarOffic= e, Adobe Acrobat ... - database (kg) Examples: the database server and clients, database content files - desktop (kg) Examples: desktop programs like kde tools, acrobat reader, winzip, games, screensavers,web browser, email clients - security (kg) Examples: nmap, hping2, virus scanners and signatures, content filter software, tripwire/aide/samhain, IDS tools - sysutils (kg) Examples: every day sysadmin utilities (*nix: /sbin/*,/usr/bin ...) - server daemons (kg) Examples: sendmail,postfix,pop3d,imapd,apache,... - web/network (?) (kg) Examples: cgi scripts, php files, ... - multimedia (kg) Examples: sound-, picture-, video-files - drivers (kg) Examples: driver software (sic!) - (child-)porn (kb) Examples: the name says it all - malware (kb) Examples: rootkits, malicious code, worms, viruses, trojans, backdoors = ... - other (kg) Examples: everything which doesn't fit in the other categories Is "malware" an appropiate name ? Shall we further divide this category ? How about the separate "child-porn" section ? There are other kinds of illegal porn which do not fit in this category. Is "web" or "network" a better name ? What more content would include network which doesn't fit in the other categories ? It seems that our "remote management" categoriy includes potentially more known-bad (subseven/BO(2K) ...) than known-good tools. Should we disband this category and absorb it in the other categories ? Has anyone good ideas for other groups or better group names ? Regards, Matthias --=20 Matthias Hofherr mail: mat...@mh... web: http://www.forinsect.de gpg: http://www.forinsect.de/pubkey.asc |
From: Brian C. <ca...@sl...> - 2004-01-31 17:48:40
|
For the record, I've discussed this with M=E1rcio offline a little while=20= back, so please don't think I'm blowing off his request. I too think that a non-HTML interface would be useful, but that is a=20 lot of work. I was recently reminded though that another need in the area is a GUI=20 for disk acquisition. A window that allows you to choose the source=20 and the destination and it figures out the needed flags for 'dd' and=20 runs it. I'm sure the people on this list have figured out 'dd' fairly=20= well, but it is a bit awkward for people who are moving from=20 Windows-based acquisitions to Linux-based ones. The bootable CDs would be that much better if it came up with a window=20= that allowed people to do acquisitions. In light of NTI not making the=20= new format of Safeback public, i would think that LE is going to be=20 looking for a new solution and the command line aspects of 'dd' are=20 likely a deterrence. I just thought I would throw it out there in case there were GUI people=20= who were interested. It could even be HTML-based for portability. brian On Jan 28, 2004, at 6:18 PM, MXrcio Carneiro wrote: > Hello, all! > > I'm new here. I'm from Brazil, and I work with computer forensics (and=20= > other forensics areas too). > > I'd like to help Autopsy in some way. I'm searching for a almost=20 > complete tool (as "complete" is something dificult!), and I think=20 > Autopsy+Sleuth is going in that direction. Of course that I'm looking=20= > for a open solution. > > I have nothing against the web interface, but everybody seems to agree=20= > that is not the ideal.=20= |
From: Brian C. <ca...@sl...> - 2004-01-31 05:14:01
|
[the list server is so slow this week. I forwarded a message this morning and it still hasn't been posted]. So, after thinking about this thread some more, there are two problems that are being addressed at the same time and I think they can be more independent and I think the merging has caused some confusion. 1. A small set of application categories for any hash database. 2. An implementation of a database that can import hashes from multiple sources. As I mentioned before, the categories are a problem with all databases and I think it would be useful if we could publish a list with requirements for each category. From Doug's email, it sounds like NIST would be interested in such categories (assuming that they are comprehensive and make sense). For the implementation, it seems that we need to have a clear goal for the DB. Is it for a comprehensive DB or is it just for quick good vs bad lookups. Both are needed, but can we satisfy both goals with one DB? Or, could that be an option at install time. They can chose the quick / dirty / less data version or the full version. I'm not a DB guy, so I have no clue what the answers for this are. It has occurred to me that there should be a 'source' column in the database, so that the entry can be attributed to the NSRL, hashkeeper, custom etc. A version may also be useful. This is also useful so that you can remove the hashes from the DB at a later point. thanks, brian |
From: Brian C. <ca...@sl...> - 2004-01-31 02:10:30
|
I emailed Doug White at NIST to let him know this was being discussed. Many interesting things in here: Begin forwarded message: > From: dw...@ni... > To: "Brian Carrier" <ca...@ce...> > Subject: Re: NSRL Categories > > Brian - thanks a LOT for calling that to my attention. > > Feel free to share anything in this mail with the list - I wanted to > get > back ASAP and didn't look into signing up on the list (yet). > > First - I'm open to any suggestions about formalizing the application > type fields. They are completely arbitrary, taken off the boxes that > the > software arrives in. We try not to create new types, realizing that too > many would be useless, but they could be better defined. I look forward > to hearing other's ideas about this. > > Second - we have moved over to a completely open source-based > hashing environment in our lab and are running parallel tests. The > DBMS is MySQL, and we plan to replicate the tables from our lab server > to a publicly accessible server and publish the port/connection > information. > If we can offer ODBC access to the world (with a throttle) and web > access > like the Sun fingerprints, along with the RDS downloads, that should > go a long way. There was some thought about a DNS-like hash lookup > protocol, but that's been shelved. Any other thoughts are welcome! > > Third - since we've migrated to open source, we're building > Knoppix-like > boot CDs (one for server, one for cluster nodes) that any organization > can use on existing computers to replicate our hashing cluster, and > produce RDS-format hashsets. (sweet, use existing computers without > perturbing them) So hopefully everybody and their brother will be > making > ***and SHARING*** hashsets, and nailing down the categories now, before > we start handing out CDs would be great. > > Fourth - as far as "huge amounts of data" you ain't seen nothin yet. > :-) > We're doing hashes of 512 Byte blocks, and the rule of thumb there is > plan to collect half the amount of the raw data: 4GB of files = 2GB of > hashes. > We've got 1.75TB of application files... we should start with "known > bad". > I mention this because I saw someone was concerned about virus > mutations. > What's the chance that a virus will mutate and NOT change a byte in > every > 512B block? With the block hashes, an investigator (with time and > space) > could use dcfldd to get MD5's of each block while imaging a disk and > compare those with the NSRL block hashes, i.e. Darl.exe doesn't match > the SHA/MD5 of any other file, but 3 of the 5 blocks match blocks from > MyDoom... busted! (well, not THAT easy, or I'd have $250,000 now) > > > Finally - we are going to have a workshop at NIST Gaithersburg on > Tuesday June 29, "Digital Forensics Using Hashsets". We aim to bring > digital forensic tool users, digital forensic tool vendors, and hashset > producers > together to expand user awareness, improve tool capabilities and guide > hashset development. It would be great to get some people from the list > at this workshop. > Registration cost $105 > Lodging: NIST has a block of rooms available at $99/night (below per > diem) > Attendees will receive the most current NSRL hashset, lunch is > provided, > access to vendor display area. Vendors may request booth space via the > registration form. > We will be linking in more info on www.nsrl.nist.gov very soon. > > Again, thanks for dropping me a note, and I hope this brain dump > spurs on improvements for the community. Doug > >> I just wanted to let you know that there is an effort going on in the >> sleuthkit-developers list about defining some categories for hash >> databases. > > > Douglas White National Institute of Standards and Technology > National Software Reference Library - www.nsrl.nist.gov > NIST, 100 Bureau Drive Stop 8970, Gaithersburg, MD 20899-8970 > Voice: 301-975-4761 Fax: 301-926-3696 Email:dou...@ni... > My opinions aren't necessarily my employer's nor any other > organization's. > _.__ _.__ __.. "There is no spoon." _.__ _.__ __.. |
From: Matthias H. <mat...@mh...> - 2004-01-30 14:25:55
|
Brian Carrier said: [...] > So, after thinking about this thread some more, there are two problems > that are being addressed at the same time and I think they can be more > independent and I think the merging has caused some confusion. > > 1. A small set of application categories for any hash database. > > 2. An implementation of a database that can import hashes from > multiple sources. > > As I mentioned before, the categories are a problem with all databases > and I think it would be useful if we could publish a list with > requirements for each category. From Doug's email, it sounds like NIST > would be interested in such categories (assuming that they are > comprehensive and make sense). Ok, then let's treat the list of applications separately. We can later decide if/how we want to implement this in our database. I'll compile a list with examples out of our recent discussion and post it this weekend for further discussion. > For the implementation, it seems that we need to have a clear goal for > the DB. Is it for a comprehensive DB or is it just for quick good vs > bad lookups. Both are needed, but can we satisfy both goals with one > DB? Or, could that be an option at install time. They can chose the > quick / dirty / less data version or the full version. I'm not a DB > guy, so I have no clue what the answers for this are. After thinking about the recent discussion and your comments, I would prefer not to separate the database but instead the interface: - we use a comprehensive database with a large set of information for eac= h hash set - upon importing, everybody can decide for himself how much data to include into the database - we provide a mapping table in order to map the very detailed categories to a small set of super-categories - we provide 2 interfaces: "quick&dirty" (->super-categories) and "long&detailed" The biggest part of the database are the hashsets themself. The organization of comprehensive add-on information doesn't use much ressources, it requires only a good data model. So we gain not much by using two different database models. > It has occurred to me that there should be a 'source' column in the > database, so that the entry can be attributed to the NSRL, hashkeeper, > custom etc. A version may also be useful. This is also useful so that > you can remove the hashes from the DB at a later point. Good idea, I do use this already (without a version) in my forensic hash database. Regards, Matthias |
From: Márcio C. <ma...@di...> - 2004-01-29 00:56:05
|
Hello, all! I'm new here. I'm from Brazil, and I work with computer forensics (and other forensics areas too). I'd like to help Autopsy in some way. I'm searching for a almost complete tool (as "complete" is something dificult!), and I think Autopsy+Sleuth is going in that direction. Of course that I'm looking for a open solution. I have nothing against the web interface, but everybody seems to agree that is not the ideal. So, I'm thinking in a new interface, with some features: - based on something like GTK, or QT. There are other options, and I'd love sugestions. In fact, I'm not an expert in GUI toolkits. - same as Autopsy: cases, with hosts, with images, etc. The possibilities for add-ons/plugins/new functions. - work with multiple users on the cases, like Autopsy. In an environment with investigators having fast machines, we can just centralize the evidence locker (each investigator runs an Autopsy). In an environment with only a fast machine, the investigators could run it on the "server"). There are a lot of possibilities here... One common locker can be implemented using NFS or other network fs, transparently to Autopsy. I don't know how far we have to go in this aspect. - a lot of useful tools as described in this list. Maybe a lot of those can be implemented outside the interface, as libs or independent apps, so they can be used in scripts or with another interface (I don't imagine a text interface, but who knows?) - so far I'm thinking about C. But could be another options... I saw a message here about a software called Rex, but I couldn't run it, had problems with Java and the installation. Portability is great, but I'm not so worried about Windows so far... That seems great in my imagination, but there is a long way to get there... Does anybody has comments, suggestions, and energy to help me? :-) Should I (we) really go in this direction? Best regards, Márcio. |
From: David B. <to...@so...> - 2004-01-28 18:48:45
|
* Brian Carrier (ca...@sl...) wrote: [snip] > It would be nice if each entry had a static size, so that we could jump > around the text file of the database easily. Therefore, there would be > an index that correlates an application type to an integer. I would > think that doing integer comparisons would be faster than string > comparisons though when looking entries up. That maybe a pain to > manage though. Yes it is a good idea to map applications type to an integer; I even think that the OS field should be an integer too. It shouldn't be a pain to manage them if the import tools make the task easier. (the problem then is to develop proper import tools ;)) > > >Application entry: > >- remote management Thinking about this category, perhaps it is included in the server daemons category (for the servers) and network category (for the clients) > >- office tools > > Would adobe acrobat reader and calendars fit into this category? Yes, and even a mail client. > > >- database > >- desktop > > What are examples of this category? games? Proper examples for this category would be games, IM, screensavers, iconsets, wallpapers... but perhaps this category should be merged with the multimedia category ¿? > > >- server daemons > >- web > > A general name like network may scale better. Would email tools fit in > here too? I prefer network too, but take into account that also all the web scripts (CGI, php, perl, ...) should fit in this category. > >- multimedia > >- drivers > >- development > >- sysutils > >- security > > Would this include tools that are frequently called "hacker" tools too? > This category could be difficult and controversial to maintain, but I > don't know of a better way to do it... I would split perhaps this category in two other categories: security(whitehat) and malware(exploits, rootkits, ...) I know that malware is not the right word for them, but it is the name that gather more different types of such files. Other approach is to include only the 'whitehat' security tools in this category and the 'blackhat' tools in the next category (known-bad) > >- known-bad > > Should there be a known-good too? I can imagine a situation where > someone hashes his /bin/, /sbin/, /usr/local/bin ... directories and > doesn't want to have to identify the category of each file. Both known-bad and known-good could be a 'wrapper' for other categories. > >- other > > Where would child-porn fit into this? known-bad? That seems to be one > of the biggest categories of hashes and may warrant its own category. According to the above, it should fit in both malware (replace this word with other more suitable) and known-bad. > >Operation system entry: > >- Linux > >- Windows > >- BSD > >- Mac > >- MacOSX > >- Solaris > >- DOS > >- Handheld OS > >- AIX > >- HP-UX > >- Other > > MacOS probably shouldn't get a separate category from OSX unless Win > '98 is also separated from Win XP. The specific types in BSD should be > defined (since OS X is actually a variant of BSD). The Solaris > category should also include SunOS. Then we'd add OpenBSD, FreeBSD and NetBSD, and delete OSX. SunOS is included in the Solaris category. [snip] > >Did we miss important fields ? > > SHA-2 maynot be a bad idea. I recall threads in the past on other > lists about using SHA-2, so we may want to make a field for it (even > though the public DB don't use it yet). It can take the place of > CRC32. I have never used SHA-2 nor CRC32. If SHA-2 is being currently used, we should definitely add it. > Is the file size needed? I'm trying to think of a scenario where that > would be needed. Hmm not sure about that, but what happens when an application has several files with the same name in different directories (and different hashes)?. In addition, we should specify the application language in some field, because for instance the nt.dll file is different for Windows 2000 English version and Windows 2000 Spanish version, both with the same patches applied. |
From: Matthias H. <mat...@mh...> - 2004-01-28 18:12:03
|
Michael Cohen said: [...] > I find this is an important requirement, particularly for sql databases= . > The > os and applications should be short ints so that an index may be built = on > them making it faster to search. Also I found that building a partial > index > on the md5 column itself speeds things up several orders of maginitude, > but > still keeps the index size reasonable so it fits well in ram. Performance will not be one of our bigger problems. Even with, say 20 million entries (NSRL alone has nearly 18 mio.), we should get reasonable search times, provided we use some clever indexing. Sure, one problem will be to import 20 mio. entries. But with index dropp= ing and setting it after the import we will gain much time. The performance question is not important as long as we do not have a goo= d data model. To add performance features is simple textbook work. >> > Application entry: > Are you suggesting to not name the application product at all? but rath= er > only > contain information on the category of the application? So for example = in > the > table "msword.exe" will have office tools as application, but not refer= to > microsoft word as a product? I really think that you still need to > classify > the hash set with the commercial name of the application, otherwise you > would > not know which specific application xyz.dll belongs to. I think we have to decide if we want kind of a full management database with all possible kind of information for a hash set or if we need a database with a relatively small number of categories for excluding knowngoods and alerting on knownbads. For the later, we do not need to know if "msword.exe" is from the Package "Microsoft Office 2000 SP 3 Hotfix 2a". For the former, we need the detailed information. Which brings us to an other problem: Do we allow duplicate entries for hashsums in the database ? The former solution will allow this, the later probably doesn't require it. > In general I think the approach taken by NSRL is not a bad one. [...] > This is much more effective than > having to redo the entire nsrl. The problem is, that it is absolutely no problem the make a database structure for NSRL. In fact, NSRL already has a full generic database structure which could be easily adapted. But this was, so far, not my intention (see above) Yet, we do not have to redo the NSRL database. We only have to define a mapping (once) for NSRL categories. Automatic import with a parser scri= pt is not problem. Since NSRL categories do not change too much, maintainanc= e should be no problem. >> MacOS probably shouldn't get a separate category from OSX unless Win >> '98 is also separated from Win XP. The specific types in BSD should b= e >> defined (since OS X is actually a variant of BSD). The Solaris >> category should also include SunOS. > I think that OSs should be granulated down as much as practically > possible. So > I would give win98 a different category than winXP. Maybe not so much a= s > to > seperate the different service packs, but its often very evident what > kind > of os you are working on, and it would speed things out considerably if > the > database could be split into different tables, depending on the OS. Thi= s > effect can be achieved by building an index on the OS column, this > severely > lightens the load on the query if we restrict our searches to particula= r > os's. Same problem as above: either we use small categories with a usable interface or we define huge categories with a VERY large interface. Agreed, the later will result in a faster performance due to more detaile= d constraints in the query. But with good indexing and persistent database connections, speed should be reasonable with small categories as well. Regards, Matthias |
From: Matthias H. <mat...@mh...> - 2004-01-28 17:38:43
|
Brian Carrier said: [...] > I thought about what software I have on my systems and tried to fit it > in, so there are some questions about what goes where. Could you maybe > provide requirements for software to fit into each category? Ok, I'll fill the categories with descriptions. > It would be nice if each entry had a static size, so that we could jump > around the text file of the database easily. How about this: we use fields with dynamic length in the database and use an export tool for exporting with static sizes ? We could set the maximal length with datatypes like "varchar(40)". > Therefore, there would be > an index that correlates an application type to an integer. I would > think that doing integer comparisons would be faster than string > comparisons though when looking entries up. That maybe a pain to > manage though. Sure, we need integer identifiers for performance. I deliberatley didn't mention them because I think we first have to agree on the data model. Things like primary keys, foreign keys, indices etc. should follow when we find a good data model. >> Application entry: >> - remote management >> - office tools > > Would adobe acrobat reader and calendars fit into this category? I would place adobe acrobat and calendars in the desktop category. >> - database >> - desktop > > What are examples of this category? games? > >> - server daemons >> - web > > A general name like network may scale better. Would email tools fit in > here too? > >> - multimedia >> - drivers >> - development >> - sysutils >> - security > > Would this include tools that are frequently called "hacker" tools too? > This category could be difficult and controversial to maintain, but I > don't know of a better way to do it... Sure, the problem we have is with tools like nmap,nemesis, hping etc. (to= ols both used for good and bad things). I like Matt McMillon's idea to search categories both as knowngood and knownbad. So everybody can decide for himself during search-time how to handle this. I think, operation system categories should be per default known-good. Each application categories should get an individual default setting for knowngood/knownbad. >> - known-bad > > Should there be a known-good too? I can imagine a situation where > someone hashes his /bin/, /sbin/, /usr/local/bin ... directories and > doesn't want to have to identify the category of each file. Known-bad was kind of a catch-all for all possible known-bad files. Problem is, if we segment known-bad, we'll get dozens of subcategories. While this is no problem in the database, it will be difficult to handle for autopsy. >> - other > > Where would child-porn fit into this? known-bad? That seems to be one > of the biggest categories of hashes and may warrant its own category. Yes, I thought it should be known-bad. During my forensic analyses, my main objectives so far were hacking-related, not child-porn. So it may be that I have kind of a blind spot for this problem. Ok, let's add a separate category "child-porn" with "known-bad" as defaul= t. > >> Operation system entry: >> - Linux >> - Windows >> - BSD >> - Mac >> - MacOSX >> - Solaris >> - DOS >> - Handheld OS >> - AIX >> - HP-UX >> - Other > > MacOS probably shouldn't get a separate category from OSX unless Win > '98 is also separated from Win XP. The specific types in BSD should be > defined (since OS X is actually a variant of BSD). The Solaris > category should also include SunOS. Ok, so BSD would include: (Free|Open|Net-BSD|BSD/OS|OS X) What about IRIX,TRUE64 etc ? Did we forget a category with many entries ? Problem is, we should hold the number of OS's low for a better usability of the search interface. Out of the box I find about three doze= n operation systems and probably forgetting some other dozen. > SHA-2 maynot be a bad idea. I recall threads in the past on other > lists about using SHA-2, so we may want to make a field for it (even > though the public DB don't use it yet). It can take the place of > CRC32. Good point here. > This looks good. I think more requirements for each app category would > be useful though. I'll compile a new draft with some more flesh to each category. This should help us for a more detailed discussion of the categories. Regards, Matthias |
From: Brian C. <ca...@sl...> - 2004-01-28 14:41:33
|
> >>> Application entry: > So I suggest to make > another table where you classify the products into categories etc. > e.g.: > > product_code/application_code/package whatever code is appropriate > product category > > So the hash table should have information relating a specific hash to > MSword > for example, and this new table tells us that msword is an office app. > Similarly if we see a hash matching back orifice, we consult this new > table > to find that back orific is a hacker app. This is much more effective > than > having to redo the entire nsrl. That is a really good point. The only problem we are trying to solve is the number of application categories. We could even use all of the fields that the NSRL uses and write a program to read in the NSRL and output the NSRL with the new categories. With regard to separating by platform and more granular OS, I think that is useful for the operating system binaries. But, for applications that could be harder. Many windows apps run on different versions. If it has to be tied to every new Windows version, then it might be a pain to maintain. thanks, brian |
From: Michael C. <mic...@ne...> - 2004-01-28 08:37:37
|
Hi All, > > File entry: > > - sha1 > > - md5 > > - os > > - application > > - filename > > - filesize > > It would be nice if each entry had a static size, so that we could jump > around the text file of the database easily. Therefore, there would be > an index that correlates an application type to an integer. I would > think that doing integer comparisons would be faster than string > comparisons though when looking entries up. That maybe a pain to > manage though. I find this is an important requirement, particularly for sql databases. The os and applications should be short ints so that an index may be built on them making it faster to search. Also I found that building a partial index on the md5 column itself speeds things up several orders of maginitude, but still keeps the index size reasonable so it fits well in ram. > > Application entry: Are you suggesting to not name the application product at all? but rather only contain information on the category of the application? So for example in the table "msword.exe" will have office tools as application, but not refer to microsoft word as a product? I really think that you still need to classify the hash set with the commercial name of the application, otherwise you would not know which specific application xyz.dll belongs to. In general I think the approach taken by NSRL is not a bad one. I sympathise with the dillema of not being able to rely on the hashes to get a quick yes/ no answer as to whether a disk contains "bad files". I think the task set out for by the NSRL is to merely identify the files. Classifying them into categories is a purely subjective decision, based in the most part on the circumstances of the case. The NSRL is used to see what applications/ packages/products are installed, the decision of those applications which are bad should be done in a seperate table altogether. So I suggest to make another table where you classify the products into categories etc. e.g.: product_code/application_code/package whatever code is appropriate product category So the hash table should have information relating a specific hash to MSword for example, and this new table tells us that msword is an office app. Similarly if we see a hash matching back orifice, we consult this new table to find that back orific is a hacker app. This is much more effective than having to redo the entire nsrl. > MacOS probably shouldn't get a separate category from OSX unless Win > '98 is also separated from Win XP. The specific types in BSD should be > defined (since OS X is actually a variant of BSD). The Solaris > category should also include SunOS. I think that OSs should be granulated down as much as practically possible. So I would give win98 a different category than winXP. Maybe not so much as to seperate the different service packs, but its often very evident what kind of os you are working on, and it would speed things out considerably if the database could be split into different tables, depending on the OS. This effect can be achieved by building an index on the OS column, this severely lightens the load on the query if we restrict our searches to particular os's. > > Questions so far: > > Do we need a separate architecture field for a hashsum entry ? This > > will > > require an additional search parameter later. I think we do, for the reason i mentioned above- no point searching all those spark entries when we are clearly working on an intel box. > Is the file size needed? I'm trying to think of a scenario where that > would be needed. Sometimes its usefull to see the filesize if the file is extremely small, e.g. 1 byte or 2 bytes - its very easy to get hash collisions on these files and the database is not reliable - in fact i think hashes should not be taken of such small files, but NSRL is full of 0 byte files. > This looks good. I think more requirements for each app category would > be useful though. It would be useful to design the hash database in a way that can leverage off NSRL, since NSRL is the richest source of hashes at the moment. Regards. Michael. |
From: Brian C. <ca...@sl...> - 2004-01-27 23:15:40
|
> in cooperation with David Barroso I compiled a first proposal > for the structure of a hash database: Great. I thought about what software I have on my systems and tried to fit it in, so there are some questions about what goes where. Could you maybe provide requirements for software to fit into each category? > File entry: > - sha1 > - md5 > - os > - application > - filename > - filesize It would be nice if each entry had a static size, so that we could jump around the text file of the database easily. Therefore, there would be an index that correlates an application type to an integer. I would think that doing integer comparisons would be faster than string comparisons though when looking entries up. That maybe a pain to manage though. > Application entry: > - remote management > - office tools Would adobe acrobat reader and calendars fit into this category? > - database > - desktop What are examples of this category? games? > - server daemons > - web A general name like network may scale better. Would email tools fit in here too? > - multimedia > - drivers > - development > - sysutils > - security Would this include tools that are frequently called "hacker" tools too? This category could be difficult and controversial to maintain, but I don't know of a better way to do it... > - known-bad Should there be a known-good too? I can imagine a situation where someone hashes his /bin/, /sbin/, /usr/local/bin ... directories and doesn't want to have to identify the category of each file. > - other Where would child-porn fit into this? known-bad? That seems to be one of the biggest categories of hashes and may warrant its own category. > Operation system entry: > - Linux > - Windows > - BSD > - Mac > - MacOSX > - Solaris > - DOS > - Handheld OS > - AIX > - HP-UX > - Other MacOS probably shouldn't get a separate category from OSX unless Win '98 is also separated from Win XP. The specific types in BSD should be defined (since OS X is actually a variant of BSD). The Solaris category should also include SunOS. > Questions so far: > Do we need a separate architecture field for a hashsum entry ? This > will > require an additional search parameter later. Probably not. > Does anyone need a crc32 entry with the hashsum ? I don't think it is needed. It is not best practice to use CRC, so there isn't much point in including them. > Did we miss important fields ? SHA-2 maynot be a bad idea. I recall threads in the past on other lists about using SHA-2, so we may want to make a field for it (even though the public DB don't use it yet). It can take the place of CRC32. Is the file size needed? I'm trying to think of a scenario where that would be needed. > Did we miss important questions ;-) This looks good. I think more requirements for each app category would be useful though. thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-27 18:13:14
|
Hi list, in cooperation with David Barroso I compiled a first proposal for the structure of a hash database: File entry: - sha1 - md5 - os - application - filename - filesize Application entry: - remote management - office tools - database - desktop - server daemons - web - multimedia - drivers - development - sysutils - security - known-bad - other Operation system entry: - Linux - Windows - BSD - Mac - MacOSX - Solaris - DOS - Handheld OS - AIX - HP-UX - Other The fields per category should be easily manageable with a web based analysis gui (autopsy). Usually, only one of the categories should be required for a forensic analysis step ("filter all linux hashsums from my image", "identify application xyz on my image" ...). Questions so far: Do we need a separate architecture field for a hashsum entry ? This will require an additional search parameter later. Does anyone need a crc32 entry with the hashsum ? Did we miss important fields ? Did we miss important questions ;-) Feedback for this proposal is welcome and encouraged. Regards, Matthias --=20 Matthias Hofherr mail: mat...@mh... web: http://www.forinsect.de gpg: http://www.forinsect.de/pubkey.asc |
From: Matthias H. <mat...@mh...> - 2004-01-23 18:10:39
|
Brian Carrier said: > I'll just keep this on the developers list. > > On Thursday, January 22, 2004, at 06:26 PM, Matthias Hofherr wrote: [...] > > Can you lead the effort on making such a list then? I'll give it a try. > I can't imagine > having more than 15 categories. Otherwise it gets too messy and would > be too difficult to look at in the configuration window. I,too, think less categories are better. Yet we have two major kind of categories: - operation systems: always known-goods, not so many categories, many hash-sets - applications: many categories, which have to be compressed to, let's say 15 categories; may be both known-goods or known-bads I will give the matter some thoughts, talk with some people and compile a first proposal for the list. If anyone on this list is also interested in this matter, drop me a mail off list. Matthias |
From: Brian C. <ca...@sl...> - 2004-01-23 05:39:32
|
I'll just keep this on the developers list. On Thursday, January 22, 2004, at 06:26 PM, Matthias Hofherr wrote: > Brian Carrier said: >> [I was hoping you would be interested in this topic in light of your >> new database :)] > > Yup, I am interested ;-) Excellent. > I think the NSRL segmentation in products/operation > systems/manufacturers is > a good idea. Yet, the NSRL provided categories are partially duplicate > and > partially too much segmented. There is no simple solution for a query > like > "check only against Linux system hashes". > I think we should define a basic set of operation systems and other > classification data and maintain a mapping table for imports of NSRL > and > other Hashsets. Can you lead the effort on making such a list then? I can't imagine having more than 15 categories. Otherwise it gets too messy and would be too difficult to look at in the configuration window. If we can make a comprehensive list of categories that scales for types of applications and/or types of platforms (although app type seems to be more important) then I would like to get it published in the IJDE (or similar) and see if we can make an argument for it to be a "standard" and adopted in the NSRL and others. thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 23:26:20
|
Brian Carrier said: > [I was hoping you would be interested in this topic in light of your > new database :)] Yup, I am interested ;-) [...] > > I'm assuming that you are referring to a global database in the local > sense. That each person has their own "global" database that they > create and can add and remove hashes from. Not a global database in > the Solaris Fingerprint DB sense. Yes, I meant "local sense". Each user has other needs/requirements, so a global database for everybody should be out of question. >> The interface to >> autopsy and sleuthkit should allow to query only certain categories, >> only >> known bads, a certain category as known bad or not(-> e. g. remote >> management tools). The biggest problem here is to manage the category >> mapping table for all the different tools. > > I agree. Especially when you start merging the home made hashes with > those from the NSRL and hashkeeper. I guess we could have a generic > category of 'Always Good' or 'Always Bad'. > >> The technical problem is to manage such a huge amount of raw data. Wit= h >> NSRL alone, we have millions of hash sets. This requires a new query >> mechanism. With a RDBMS, we need persistent connections and the >> possibility >> to bulk query large data sets very fast. With the current sorter|hfind >> design, sorter calls hfind one time per hash analyzed. This is >> definitely >> a big bottleneck. > > Yea, I have no problem if the end solution requires a redesign of hfind > and sorter. > > I'm just not sure what the end solution should be. Some open questions= : > - what application categories are needed? Are the NSRL ones sufficient > or are there too many / too few of them? > - How do you specify in the query which cat are bad and which are good? > - How do you specify to 'sorter' which cat are bad and which are good? > - Do we want to require a real database (i.e. SQL) or should there also > be an ASCII file version? I think the NSRL segmentation in products/operation systems/manufacturers= is a good idea. Yet, the NSRL provided categories are partially duplicate an= d partially too much segmented. There is no simple solution for a query lik= e "check only against Linux system hashes". I think we should define a basic set of operation systems and other classification data and maintain a mapping table for imports of NSRL and other Hashsets. In my opinion a SQL database should be the base (easier structure, multi-index ...). On the other hand, there is no reason not to provide an export utility for ASCII exports in a defined format. This should handle both requirements. Matthias |
From: Brian C. <ca...@sl...> - 2004-01-22 22:38:36
|
[I was hoping you would be interested in this topic in light of your new database :)] On Thursday, January 22, 2004, at 01:33 PM, Matthias Hofherr wrote: > Logical we need > to maintain a potential huge amount of data and categorize every single > hash entry. Furthermore, we have to decide for each entry if it is a > known-bad or a known-good. I think a useful solution is to maintain a > global database with both freely available hashsums like > NSRL,KnownGoods > combined > with selfmade hash set (md5sum/graverobber ...). I'm assuming that you are referring to a global database in the local sense. That each person has their own "global" database that they create and can add and remove hashes from. Not a global database in the Solaris Fingerprint DB sense. > The interface to > autopsy and sleuthkit should allow to query only certain categories, > only > known bads, a certain category as known bad or not(-> e. g. remote > management tools). The biggest problem here is to manage the category > mapping table for all the different tools. I agree. Especially when you start merging the home made hashes with those from the NSRL and hashkeeper. I guess we could have a generic category of 'Always Good' or 'Always Bad'. > The technical problem is to manage such a huge amount of raw data. With > NSRL alone, we have millions of hash sets. This requires a new query > mechanism. With a RDBMS, we need persistent connections and the > possibility > to bulk query large data sets very fast. With the current sorter|hfind > design, sorter calls hfind one time per hash analyzed. This is > definitely > a big bottleneck. Yea, I have no problem if the end solution requires a redesign of hfind and sorter. I'm just not sure what the end solution should be. Some open questions: - what application categories are needed? Are the NSRL ones sufficient or are there too many / too few of them? - How do you specify in the query which cat are bad and which are good? - How do you specify to 'sorter' which cat are bad and which are good? - Do we want to require a real database (i.e. SQL) or should there also be an ASCII file version? thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-01-22 18:33:08
|
Brian, I think we have technical and logical issues here. Logical we need to maintain a potential huge amount of data and categorize every single hash entry. Furthermore, we have to decide for each entry if it is a known-bad or a known-good. I think a useful solution is to maintain a global database with both freely available hashsums like NSRL,KnownGoods combined with selfmade hash set (md5sum/graverobber ...). The interface to autopsy and sleuthkit should allow to query only certain categories, only known bads, a certain category as known bad or not(-> e. g. remote management tools). The biggest problem here is to manage the category mapping table for all the different tools. The technical problem is to manage such a huge amount of raw data. With NSRL alone, we have millions of hash sets. This requires a new query mechanism. With a RDBMS, we need persistent connections and the possibility to bulk query large data sets very fast. With the current sorter|hfind design, sorter calls hfind one time per hash analyzed. This is definitely a big bottleneck. Best regards, Matthias --=20 Matthias Hofherr mail: mat...@mh... web: http://www.forinsect.de gpg: http://www.forinsect.de/pubkey.asc Brian Carrier said: > Is anyone interested in looking into the best way to manage hashes? The > definition of "good" versus "bad" is relative to the current > investigation and I don't know the best way to handle this in The > Sleuth Kit and Autopsy. There could be a single database with > categories of hashes and you choose which are good and which are bad > for that investigation (similar to the new Forensic Hash Database that > was announced and NSRL). Or, you could import tens of hash databases > and identify them as bad or good (like hashkeeper). > > I think hashkeepr is LE-only, so I would rather focus on using NSRL and > custom hashes made by md5sum. If anyone is interested in working on a > workable solution to this, let me know. |
From: McMillon, M. <Mat...@qw...> - 2004-01-21 22:23:11
|
> That still leaves the problem of organizing what is "good" though. is > pcAnywhere a good or bad hash? Depends on the investigation. I suppose this is why NSRL took the approach of simply categorizing all the hashes as "known" and anything that wasn't in the DB as "unknown." One simple way to approach this would be to have the option to import individual hashes or hash sets based on some category tree structure, and then select the option of 1) display all files that match the imported hashes, 2) display all files that don't, 3) display file whose hashes match, but file names don't, etc.. Kind of an "autopsy reports, you decide" tact. <--- hoping I don't get sued by Fox News. >There are Application types in the schema, but I'm not sure how they=20 >were chosen or how many there are. You can see a list here: Seems to map somewhat to the members of the Software Business Alliance, but since NIST is a "neutral" organization I doubt there is any connection there :)=20 |
From: Brian C. <ca...@sl...> - 2004-01-21 21:11:24
|
> However, I am beginning to wonder how effective hash sets of > "known-bad" > are going to be moving into the future--I think they have shown some > benefit to LEA and others investigating child porn, malware, etc. but > as the perps get wise to this technique, you'll probably start seeing > more things like polymorphic archives, encrypted executables, and other > files types that may change based on context or just randomly when > accessed. Manually modifying files with a hex editor would be a simple > way to change the sums of any file--which is much more of a current > reality. We've seen this somewhat in the anti-virus industry which > makes me wonder how some sort of heuristics system may be more > effective > for this area. That is a good point. And one trojan source file can generate many different execs with different hashes based on what compiler flags were used. That still leaves the problem of organizing what is "good" though. is pcAnywhere a good or bad hash? Depends on the investigation. > The other big issue is categorizing the large number of hashes, I think > the reference data set of NSRL is 17.9 million hashes. Manually > categorizing them would not be possible--would have to look closer at > the NSRL "schema" to see if an automated process could be developed > once > categories were determined. There are Application types in the schema, but I'm not sure how they were chosen or how many there are. You can see a list here: http://www.nsrl.nist.gov/index/apptype.index.txt The reason that I am asking this is because it is an important issue, but I already have too many things on my plate. So, if people are interested in finding a solution to this, then please do. I won't get to it for several months. thanks, brian |
From: McMillon, M. <Mat...@qw...> - 2004-01-21 19:40:26
|
Just some random thoughts on hashes: I think managing a collection of baseline OS and application hashes would be pretty straight forward as long as you limited scope vendor "gold master" releases. Version skew from subsequent patches may cause some issues, but this would allow you to load the hash sets for the OS you are examining and quickly identify what is off of the baseline, which is pretty much what NSRL is designed for but with a much broader brush. =20 However, I am beginning to wonder how effective hash sets of "known-bad" are going to be moving into the future--I think they have shown some benefit to LEA and others investigating child porn, malware, etc. but as the perps get wise to this technique, you'll probably start seeing more things like polymorphic archives, encrypted executables, and other files types that may change based on context or just randomly when accessed. Manually modifying files with a hex editor would be a simple way to change the sums of any file--which is much more of a current reality. We've seen this somewhat in the anti-virus industry which makes me wonder how some sort of heuristics system may be more effective for this area. =20 The other big issue is categorizing the large number of hashes, I think the reference data set of NSRL is 17.9 million hashes. Manually categorizing them would not be possible--would have to look closer at the NSRL "schema" to see if an automated process could be developed once categories were determined. Matt -----Original Message----- From: sle...@li... [mailto:sle...@li...] On Behalf Of Brian Carrier Sent: Wednesday, January 21, 2004 11:15 AM To: sle...@li... Cc: sle...@li... Subject: [sleuthkit-users] Good vs. Bad Hashes Is anyone interested in looking into the best way to manage hashes? The=20 definition of "good" versus "bad" is relative to the current=20 investigation and I don't know the best way to handle this in The=20 Sleuth Kit and Autopsy. There could be a single database with=20 categories of hashes and you choose which are good and which are bad=20 for that investigation (similar to the new Forensic Hash Database that=20 was announced and NSRL). Or, you could import tens of hash databases=20 and identify them as bad or good (like hashkeeper). I think hashkeepr is LE-only, so I would rather focus on using NSRL and=20 custom hashes made by md5sum. If anyone is interested in working on a=20 workable solution to this, let me know. brian ------------------------------------------------------- The SF.Net email is sponsored by EclipseCon 2004 Premiere Conference on Open Tools Development and Integration See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. http://www.eclipsecon.org/osdn _______________________________________________ sleuthkit-users mailing list https://lists.sourceforge.net/lists/listinfo/sleuthkit-users http://www.sleuthkit.org |