Thread: [sleuthkit-developers] IO Subsystem patch for fstools
Brought to you by:
carrier
From: Michael C. <mic...@ne...> - 2004-02-03 13:48:13
Attachments:
fstools_diff
|
Dear List, Please accept this patch to the sleuthkit to implement a pluggable IO subsystem for the fstools. (patch against 1.67, fstools directory). Background Quite often users are supplied with dd images that do not immediately work with sleuthkit. Two notable examples are: - when a dd image was taken of the hdd - in this case users have to use sfdisk to work out the partition offsets and then use dd with appropriate skip parameters to extract each partition, before being able to use the sleuthkit. This is because the sk expects to have a dd image of a partition (i.e. filesystem starts at offset 0 in the image file. This is not always the case). - Sometimes images are split into smaller sizes for example in order to burn to cd/dvd etc. This means that images need to be stuck together before analysis potentially wasting time and space. It would be nice if one could use the images directly - without needing to do creative dd manipulations. Solution This patch implements a modular io subsystem approach - all filesystem operations within the sk are made to use this subsystem, and the user can choose the subsystem they want. The subsystem is responsible to seeking into the file and extracting data out of the dd image - how that is implemented is completely abstracted from the point of view of the fstools. The user can choose the subsystem to be used by the -i (io subsystem) command line switch. Then a list of arguments can be passed to the subsystem to initialise it correctly. Once that is done, the regular sk calls can be made (e.g. fs_open etc). The io subsystem will take care of the specifics of implementation. This patch includes 2 subsystem modules: simple and advanced. The simple module is exactly the same as the old sk, while the advanced module allows for specifying offsets into the dd file, as well as multiple dd files in sequence. Example: As an example the fls and icat tools were modified to support the new sub system, more tools will be converted tomorrow once i get some sleep. Example of how to seek into a partition within a disk dd: fls -i advanced -o offset=524288 -f linux-ext2 test.dd This selects the advanced io subsystem and passes it the offset option specifying 1024 blocks of 512 bytes. Now we can split the dd image across multiple files (maybe using the split utility), and still analyse them at once: fls -i advanced -o offset=524288,file=xaa,file=xab,file=xac,file=xad -f linux-ext2 xae Note that xae (the last part of the image will be appened to the list of parts automatically). Also note that all the options in -o are passed as one parameter to the subsystem which then parses them into the relevant arguements. If the subsystems name is not found, the subsystem will list all known subsystems: bash# fls -i help -f linux-ext2 test.dd Available Subsystems: standard - Standard Sleuthkit IO Subsystem advanced - Advanced Sleuthkit IO Subsystem fls: Could not set io subsystem help To get more help about the options available, try setting an option which is not supported: bash# fls -i advanced -o help -f linux-ext2 test.dd option help not recognised Advanced io subsystem options offset=bytes Number of bytes to seek to in the image file. Useful if there is some extra data at the start of the dd image (e.g. partition table/other partitions file=filename Filename to use for split files. If your dd image is split across many files, specify this parameter in the order required as many times as needed for seemless integration Future work: I am in the process of implementing a raid reassembly functionality. I.e. given a raid reconstruction map (a file telling sk the order in which raid blocks go together) and a list of dd images of individual drives, the io subsystem will transparently reassemble the logical data. I have a working prototype so i know its possible. The abstracted io subsystem concept will be very handy for that. |
From: Michael C. <mic...@ne...> - 2004-02-03 22:10:53
|
On Wed, 4 Feb 2004 03:34 am, Brian Carrier wrote: > Wow! This looks great! Thanks, Brian. > My original plan was to use the '-o' flag to specify the sector offset > for the file system. I figured sectors would be easier than bytes > because mmls and fdisk give you the values in sectors and almost every > disk uses a 512-byte sector. This also allows people to use the offset > value without the '-i' setting. Great idea. Sectors would be much more useful than straight bytes. The idea is that each subsystem may choose to implement its logical-physical mapping however makes sense for it. And therefore would need different parameters most conveniently denoted by name. So rather than waste a whole option -o on just an offset, maybe we could use -o to specify a number of subsystem dependant options. If we want people to be able to use -o without needing to worry about using -i, thats easily solved. If you dont use -i, the default sk subsystem is used, and it can simply take a single option, being offset. so users can just use -o to implement a simple offset. > I like the idea of the '-i' because it is like specifying the image > type, whereas -f is specifying the file system type. I hadn't thought > about getting this advanced, but it looks good. > > I would actually say that '-i' should only have the type and no other > options. If multiple files are needed (splitting and RAID), then they > should be appended to the end of the command. For example, to look at > the file system at offset sector 12345, the following could be used > (names are made up): > > Normal full image: > fls -f linux-ext3 -o 12345 file1.dd > or > fls -f linux-ext3 -i single -o 12345 file1.dd > > Split Image: > fls -f linux-ext3 -i split -o 12345 file1.dd file2.dd > > LVM RAID Image: > fls -f linux-ext3 -i lvm -o 12345 lvm-config.dat > > MS LDM Spanning Image > fls -f ntfs -i ldm-span -o 12345 ldm-config.dat That is indeed a good suggestion. It needs more careful manipulation of the getopts in the client program but it should work. The only trouble is that the parameters to the subsystem can be arbitrary- subsystem specific ones, so for example maybe for split image: fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file2.dd and just in case you wanted to have a file called offset or blocksize, you could use a qualifier called file= in front of it like: fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file=offset but without a qualifier, its just interpreted as a filename. Similarly for the truely lazy user if the subsystem specific option parser sees an option consisting just a number, it takes that as the offset, then you dont need to qulify offset by using a keywork. > It would also be useful if the config file format that you are > developing for the RAID images could be used for the split images. It can, but the algorithm for the raid reconstruction is more complex, and performance would suffer if the same subsystem was used all around. The format (not finalised yet...) is something like: paremeter=... parameter=... slot number,disk number slot number,disk number one per line. A slot is the logical position within the raid period where the block should be taken from. example: 1,1 2,1 1,2 2,2 specifies that the first block is taken from slot 1, disk1, the next from slot 2, disk 1, the next from slot 1, disk 2 and slot 2,disk 2. so if we starts the raid period at block 0, slot 1 corresponds to block 0, and slot 2 to block 1. The next blocks requested starts a whole new period which the slots into a new set of absolute offsets, namely slot 1 is now block 2 and slot 2 is block 3... etc etc... So this scheme does use offsets to start reading the disks, and block sizes so i guess if you really wanted, you could make a raid map correspond to a number of split disks, but not easily, especially if the disks have different sizes. I guess the file may not be that human readable, because we use flag to generate it automatically. I really didnt want to have to use more advanced lex/yacc for this. What do you think? > To keep the subsystem design similar to what currently exists, have you > thought about the following: > > A new data structure IO_INFO and before fs_open is run, the io_open() > function is run with either the image lists or the config file etc and > the offset. There would probably have to be one for io_open_files(char > **) and io_open_config(char *). > > The IO_INFO structure is filed in with io_open and the needed read > functions are mapped (like file_walk etc are now in FS_INFO). > > The fs_open() function gets the IO_INFO structure passed to it and the > fs_open() no longer needs to do the open() system call on the images. > It just checks the magic value and fills in FS_INFO. Any > read_random() function in the file system code turns into > fs_info->io->read_random(...). This is an alternative design - the advantage with your method is that you could potentially have a number of different subsystems in use at the same time in the same program, while my subsystem design keeps subsystem data as static so its program wide. I just didnt really want to change all the read_random functions throughout the code (it would mean bigger changes in the architecture because almost every file will be touched many times.). I still think that it would be more useful to allow each subsystem to manage its own options, rather than trying to second guess all the options in advance and stick them into the io_info struct. So for example rather than have the io_info struct have one entry for io_open_files(char **) and io_open_config(char *), maybe we can just have an entry for void *data, and a single io_open(void *data), and allow the subsystem to set that to whatever configuration parameters make sense for it - the single file option might attach a char * in the data pointer, while the multifile stuff might attach a char **. The raid subsystem might attach a preparse linked list of its raid map so it can work off that. whatever makes sense. A couple of more types of IO subsystem i just thought of are an encase file format subsystem (allows you to read standard encase files with sk) and a compressed file subsystem (allows to work directly off compressed files). I have no idea how difficult it would be to actually implement those, but they look promising. Michael. |
From: Brian C. <ca...@sl...> - 2004-02-04 15:03:20
|
>> My original plan was to use the '-o' flag to specify the sector offset >> for the file system. I figured sectors would be easier than bytes >> because mmls and fdisk give you the values in sectors and almost every >> disk uses a 512-byte sector. This also allows people to use the >> offset >> value without the '-i' setting. > > Great idea. Sectors would be much more useful than straight bytes. The > idea is > that each subsystem may choose to implement its logical-physical > mapping > however makes sense for it. And therefore would need different > parameters > most conveniently denoted by name. So rather than waste a whole > option -o on > just an offset, maybe we could use -o to specify a number of subsystem > dependant options. Ok. I was under the impression that you wanted to have a configuration file for any of the more complex subsystems and therefore the options could be specified there. The offset is the only variable in the process that may change between executions (i.e. accessing a different partition) and doesn't make sense to be in a config file. Unless the config file allowed you to assign names to offsets. For example, assign the name 'part1' to sector offset 63 and then you could use 'fls -f ntfs -o part1 image.dd'. >> >> I would actually say that '-i' should only have the type and no other >> options. If multiple files are needed (splitting and RAID), then they >> should be appended to the end of the command. For example, to look at >> the file system at offset sector 12345, the following could be used >> (names are made up): > > That is indeed a good suggestion. It needs more careful manipulation > of the > getopts in the client program but it should work. The only trouble is > that > the parameters to the subsystem can be arbitrary- subsystem specific > ones, so > for example maybe for split image: > > fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd > file2.dd > > and just in case you wanted to have a file called offset or blocksize, > you > could use a qualifier called file= in front of it like: > > fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd > file=offset > > but without a qualifier, its just interpreted as a filename. Similarly > for the > truely lazy user if the subsystem specific option parser sees an option > consisting just a number, it takes that as the offset, then you dont > need to > qulify offset by using a keywork. That is why I was assuming that a configuration file would be used for complex situations. What does the blocksize value do for a split image? It seems that only the RAID / VM configurations need complex options. The split mode (or EnCase if that happens in the future) can be done w/out options. I would rather force complex configurations to configuration files. The command line options for the sleuth kit are already too numerous and it will make using Autopsy easier if the config file can be referenced instead of having to load up the command line every time. > >> It would also be useful if the config file format that you are >> developing for the RAID images could be used for the split images. > > It can, but the algorithm for the raid reconstruction is more complex, > and > performance would suffer if the same subsystem was used all around. The > format (not finalised yet...) is something like: Oh. I was thinking that the configuration file would have an entry that identified which IO subsystem to use. For example, a line that says: image_format = "split" or image_format = "lvm-splice" > one per line. A slot is the logical position within the raid period > where the > block should be taken from. example: > > 1,1 > 2,1 > 1,2 > 2,2 [....] > I guess the file may not be that human readable, because we use flag to > generate it automatically. I really didnt want to have to use more > advanced > lex/yacc for this. What do you think? Oh ok. I think that it will be very hard to create such a configuration file. To create the file, you will need to know which VM / RAID system is being used. I think it would be much easier to have a subsystem for each VM / RAID type and then the only thing that needs to be specified in the configuration file is the options for that type. For example, if the Linux LVM were used, then you may need to only specify the disk ordering and the block size. When reading from the image, the lvm-split-read() function would be used. >> To keep the subsystem design similar to what currently exists, have >> you >> thought about the following: >> >> A new data structure IO_INFO and before fs_open is run, the io_open() >> function is run with either the image lists or the config file etc and >> the offset. There would probably have to be one for >> io_open_files(char >> **) and io_open_config(char *). >> >> The IO_INFO structure is filed in with io_open and the needed read >> functions are mapped (like file_walk etc are now in FS_INFO). >> >> The fs_open() function gets the IO_INFO structure passed to it and the >> fs_open() no longer needs to do the open() system call on the images. >> It just checks the magic value and fills in FS_INFO. Any >> read_random() function in the file system code turns into >> fs_info->io->read_random(...). > > This is an alternative design - the advantage with your method is that > you > could potentially have a number of different subsystems in use at the > same > time in the same program, while my subsystem design keeps subsystem > data as > static so its program wide. I just didnt really want to change all the > read_random functions throughout the code (it would mean bigger > changes in > the architecture because almost every file will be touched many > times.). I have no problems changing all of the files. If we are going to add this functionality, I would rather do it right the first time. > I still think that it would be more useful to allow each subsystem to > manage > its own options, rather than trying to second guess all the options in > advance and stick them into the io_info struct. So for example rather > than > have the io_info struct have one entry for io_open_files(char **) and > io_open_config(char *), maybe we can just have an entry for void > *data, and a > single io_open(void *data), and allow the subsystem to set that to > whatever > configuration parameters make sense for it - the single file option > might > attach a char * in the data pointer, while the multifile stuff might > attach a > char **. The raid subsystem might attach a preparse linked list of its > raid > map so it can work off that. whatever makes sense. Actually, I guess we just need one io_open() function because fls.c and similar files will not know if the file is a config file or an image file. io_open would have a char ** to list the image files or config file, a type field for the type of image format, and an offset value. It would then fill in the IO_INFO structure and return it, which would be passed to fs_open(). > A couple of more types of IO subsystem i just thought of are an encase > file > format subsystem (allows you to read standard encase files with sk) > and a > compressed file subsystem (allows to work directly off compressed > files). I > have no idea how difficult it would be to actually implement those, > but they > look promising. Compression would be a major pain. Split, EnCase, and some of the RAID systems seem much easier. thanks, brian |
From: Brian C. <ca...@sl...> - 2004-02-04 05:50:45
|
Wow! This looks great! I have been meaning to incorporate the offset option for quite a while, but this is much more involved. I don't have time to look at the code in detail right now, but I have some comments from your email and a quick skim of the code. My original plan was to use the '-o' flag to specify the sector offset for the file system. I figured sectors would be easier than bytes because mmls and fdisk give you the values in sectors and almost every disk uses a 512-byte sector. This also allows people to use the offset value without the '-i' setting. I like the idea of the '-i' because it is like specifying the image type, whereas -f is specifying the file system type. I hadn't thought about getting this advanced, but it looks good. I would actually say that '-i' should only have the type and no other options. If multiple files are needed (splitting and RAID), then they should be appended to the end of the command. For example, to look at the file system at offset sector 12345, the following could be used (names are made up): Normal full image: fls -f linux-ext3 -o 12345 file1.dd or fls -f linux-ext3 -i single -o 12345 file1.dd Split Image: fls -f linux-ext3 -i split -o 12345 file1.dd file2.dd LVM RAID Image: fls -f linux-ext3 -i lvm -o 12345 lvm-config.dat MS LDM Spanning Image fls -f ntfs -i ldm-span -o 12345 ldm-config.dat It would also be useful if the config file format that you are developing for the RAID images could be used for the split images. To keep the subsystem design similar to what currently exists, have you thought about the following: A new data structure IO_INFO and before fs_open is run, the io_open() function is run with either the image lists or the config file etc and the offset. There would probably have to be one for io_open_files(char **) and io_open_config(char *). The IO_INFO structure is filed in with io_open and the needed read functions are mapped (like file_walk etc are now in FS_INFO). The fs_open() function gets the IO_INFO structure passed to it and the fs_open() no longer needs to do the open() system call on the images. It just checks the magic value and fills in FS_INFO. Any read_random() function in the file system code turns into fs_info->io->read_random(...). This looks great! brian On Feb 3, 2004, at 8:47 AM, Michael Cohen wrote: > Dear List, > Please accept this patch to the sleuthkit to implement a pluggable > IO > subsystem for the fstools. (patch against 1.67, fstools directory). > > Background > Quite often users are supplied with dd images that do not > immediately work > with sleuthkit. Two notable examples are: > - when a dd image was taken of the hdd - in this case users have to > use > sfdisk to work out the partition offsets and then use dd with > appropriate > skip parameters to extract each partition, before being able to use the > sleuthkit. This is because the sk expects to have a dd image of a > partition > (i.e. filesystem starts at offset 0 in the image file. This is not > always the > case). > - Sometimes images are split into smaller sizes for example in order > to burn > to cd/dvd etc. This means that images need to be stuck together before > analysis potentially wasting time and space. > > It would be nice if one could use the images directly - without > needing to > do creative dd manipulations. > > Solution > This patch implements a modular io subsystem approach - all > filesystem > operations within the sk are made to use this subsystem, and the user > can > choose the subsystem they want. The subsystem is responsible to > seeking into > the file and extracting data out of the dd image - how that is > implemented is > completely abstracted from the point of view of the fstools. > > The user can choose the subsystem to be used by the -i (io subsystem) > command line switch. Then a list of arguments can be passed to the > subsystem > to initialise it correctly. Once that is done, the regular sk calls > can be > made (e.g. fs_open etc). The io subsystem will take care of the > specifics of > implementation. > > This patch includes 2 subsystem modules: simple and advanced. The > simple > module is exactly the same as the old sk, while the advanced module > allows > for specifying offsets into the dd file, as well as multiple dd files > in > sequence. > > Example: > As an example the fls and icat tools were modified to support the new > sub > system, more tools will be converted tomorrow once i get some sleep. > Example > of how to seek into a partition within a disk dd: > > fls -i advanced -o offset=524288 -f linux-ext2 test.dd > > This selects the advanced io subsystem and passes it the offset option > specifying 1024 blocks of 512 bytes. > > Now we can split the dd image across multiple files (maybe using the > split > utility), and still analyse them at once: > > fls -i advanced -o offset=524288,file=xaa,file=xab,file=xac,file=xad > -f > linux-ext2 xae > > Note that xae (the last part of the image will be appened to the list > of > parts automatically). Also note that all the options in -o are passed > as one > parameter to the subsystem which then parses them into the relevant > arguements. > > If the subsystems name is not found, the subsystem will list all > known > subsystems: > > bash# fls -i help -f linux-ext2 test.dd > > Available Subsystems: > > standard - Standard Sleuthkit IO Subsystem > advanced - Advanced Sleuthkit IO Subsystem > fls: Could not set io subsystem help > > To get more help about the options available, try setting an option > which > is not supported: > > bash# fls -i advanced -o help -f linux-ext2 test.dd > > option help not recognised > > Advanced io subsystem options > > offset=bytes Number of bytes to seek to in the > image file. > Useful if there is some extra data at the start of the dd image (e.g. > partition table/other partitions > file=filename Filename to use for split files. If > your dd > image is split across many files, specify this parameter in the order > required as many times as needed for seemless integration > > Future work: > I am in the process of implementing a raid reassembly functionality. > I.e. > given a raid reconstruction map (a file telling sk the order in which > raid > blocks go together) and a list of dd images of individual drives, the > io > subsystem will transparently reassemble the logical data. I have a > working > prototype so i know its possible. The abstracted io subsystem concept > will be > very handy for that. > <fstools_diff> |
From: Michael C. <mic...@ne...> - 2004-02-07 13:33:25
|
Hi Brian, > Ok. I was under the impression that you wanted to have a configuration > file for any of the more complex subsystems and therefore the options > could be specified there. The offset is the only variable in the > process that may change between executions (i.e. accessing a different > partition) and doesn't make sense to be in a config file. Unless the > config file allowed you to assign names to offsets. For example, > assign the name 'part1' to sector offset 63 and then you could use 'fls > -f ntfs -o part1 image.dd'. Although i would normally agree with you that configuration files can contain much more information and therefore most parameters should go there, in this case it would be a pain to have all these options passed in through a configuration file. This is primarily because fstools are most often used as a backend to larger programs (like flag and autopsy), and so its much easier to shell out to them even if they have 20 command line arguements, than have to write a config file and then invoke them with it. > That is why I was assuming that a configuration file would be used for > complex situations. What does the blocksize value do for a split > image? It seems that only the RAID / VM configurations need complex > options. The split mode (or EnCase if that happens in the future) can > be done w/out options. To keep the system truely generic its useful to allow arbitrary options to be passed to the subsystem, rather than trying to incorporate within the current framework, its difficult to imagine what parameters will be required for some arbitrary subsystem someone might come up with in the future. > I would rather force complex configurations to configuration files. > The command line options for the sleuth kit are already too numerous > and it will make using Autopsy easier if the config file can be > referenced instead of having to load up the command line every time. This makes sense if the config file was the same (not changing) for a single case for example, or a set of images. So the config file only gets written once and then continually referenced from invokation to invokation, rather than have to rewrite the file each time. > Oh. I was thinking that the configuration file would have an entry > that identified which IO subsystem to use. For example, a line that > says: > > image_format = "split" > or > image_format = "lvm-splice" There might be a misunderstanding with the raid stuff. What i am writing is a generic raid subsystem that reassembles any raid 5 implementation without having a clue about its type/brand etc. The only reason it prob should have a config file is because its so huge. (the raid map is big). > Oh ok. I think that it will be very hard to create such a > configuration file. To create the file, you will need to know which VM > / RAID system is being used. I think it would be much easier to have a > subsystem for each VM / RAID type and then the only thing that needs to > be specified in the configuration file is the options for that type. > For example, if the Linux LVM were used, then you may need to only > specify the disk ordering and the block size. When reading from the > image, the lvm-split-read() function would be used. The raid map is easily assembled using the gui in flag. This is not a problem. If the investigator knows which raid implementation it is, they can lookup a library of such maps and use those, but the io subsystem needs to handle arbitrary reassembly maps. Maybe we can implement such a library in fs_io, so that you could invoke the same raid subsystem, and pass it a parameter say "type" naming which raid type it should use and it can look up the map by itself? > I have no problems changing all of the files. If we are going to add > this functionality, I would rather do it right the first time. Great. What you propose is a much better way. We just need to resolve the option issue because these options need to be present inside such an IO_INFO struct that will be passed around the place. Since these options can be arbitrary and they only make sense for the subsystem itself, what do you think about my original suggestion of having a single void *data field in the IO_INFO struct and letting each subsystem put they stuff in there? > Actually, I guess we just need one io_open() function because fls.c and > similar files will not know if the file is a config file or an image > file. io_open would have a char ** to list the image files or config > file, a type field for the type of image format, and an offset value. > It would then fill in the IO_INFO structure and return it, which would > be passed to fs_open(). Maybe it would make more sense to populate the IO_INFO structure inside the FS_INFO structure? so instead of having a fs->fd, we have a fs->io_info. Makes more sense since with io_subsystems, the FS_INFO structure doesnt really deal with file descriptors anyway (the filesystem code never directly interacts with fds). So in the filesystem code you would have: if ((fs->io_info = io_open(name, O_RDONLY)) < 0) ... and then, subsequently: fs_read_random(fs->io_info,(char *)ext2fs->fs,len,EXT2FS_SBOFF,"Checking for EXT2FS"); .... Next we just need to agree on what to put in struct IO_INFO. > Compression would be a major pain. Split, EnCase, and some of the RAID > systems seem much easier. Please see the patch i just submitted re compression. Do you have any idea how you would read in encase files? I didnt get the chance to ever use it so i dont know how complex the file format is but there is nothing i can find on the net re the format. > thanks, > brian Thanks, Michael. |
From: Brian C. <ca...@sl...> - 2004-02-09 05:35:26
|
On Feb 7, 2004, at 7:52 AM, Michael Cohen wrote: > Although i would normally agree with you that configuration files can > contain > much more information and therefore most parameters should go there, > in this > case it would be a pain to have all these options passed in through a > configuration file. This is primarily because fstools are most often > used as > a backend to larger programs (like flag and autopsy), and so its much > easier > to shell out to them even if they have 20 command line arguements, > than have > to write a config file and then invoke them with it. Yea, but autopsy or flag need to store all of those options as well in their own configuration file so that they can pass them to the fstools. It would seem more flexible if the sleuth kit had a configuration file for the image, which could then be used by any GUI (including autopsy, flag, rex etc.) If we are going to start discussing configuration files for the fstools (which we both agree are required for at least RAID), then I would rather make then general enough so that they can be used for other formats besides RAID. I would even like to have these include the file system type, mounting point, and hashes of each partition. Basically the stuff that the other tools include in some proprietary format with the image, we would put in a separate text file. >> Oh ok. I think that it will be very hard to create such a >> configuration file. To create the file, you will need to know which >> VM >> / RAID system is being used. I think it would be much easier to have >> a >> subsystem for each VM / RAID type and then the only thing that needs >> to >> be specified in the configuration file is the options for that type. >> For example, if the Linux LVM were used, then you may need to only >> specify the disk ordering and the block size. When reading from the >> image, the lvm-split-read() function would be used. > > The raid map is easily assembled using the gui in flag. This is not a > problem. > If the investigator knows which raid implementation it is, they can > lookup a > library of such maps and use those, but the io subsystem needs to > handle > arbitrary reassembly maps. > > Maybe we can implement such a library in fs_io, so that you could > invoke the > same raid subsystem, and pass it a parameter say "type" naming which > raid > type it should use and it can look up the map by itself? Sure. For this to work with the Sleuth Kit though, there must be the ability to create the configurations in the sleuth kit. If the only way to create the map is in flag, then that doesn't do autopsy any good or future interfaces and it doesn't make sense to replicate the stuff in each gui. >> I have no problems changing all of the files. If we are going to add >> this functionality, I would rather do it right the first time. > Great. What you propose is a much better way. We just need to resolve > the > option issue because these options need to be present inside such an > IO_INFO > struct that will be passed around the place. Since these options can be > arbitrary and they only make sense for the subsystem itself, what do > you > think about my original suggestion of having a single void *data field > in the > IO_INFO struct and letting each subsystem put they stuff in there? I'm still not convinced that we need so many options on the command line. The only case that I can see where all of the command line options are beneficial is for a live analysis where you don't want to write to the disk. But, in that case I don't see why you would need to use any of these complex image formats because you will have access to the raw device corresponding to the partition. Is there a specific reason with flag that command line options are easier? >> Actually, I guess we just need one io_open() function because fls.c >> and >> similar files will not know if the file is a config file or an image >> file. io_open would have a char ** to list the image files or config >> file, a type field for the type of image format, and an offset value. >> It would then fill in the IO_INFO structure and return it, which would >> be passed to fs_open(). > Maybe it would make more sense to populate the IO_INFO structure > inside the > FS_INFO structure? I would rather not. I would prefer to keep the file system code separate from the image format code. In fact, I would even consider making all of this image stuff its own library, imgtools maybe. It seems much more logical to call the file system processing code with the filled in IO_INFO structure and let it read from it. The file system code would never touch any of the file descriptors, it would just call the read functions. This also allows the 'mm...' tools to use the image formats and any other future tools, such as memory images that are split or saved in another tool's proprietary format. > Next we just need to agree on what to put in struct IO_INFO. I would lean towards the way that FS_INFO is structured. There would be a few basic items in IO_INFO, such as the function pointers and maybe the maximum size of the image. Then there are image specific structures that have their needed values. For example, the structure for split images may have an array of file descriptors and a structure with the sizes of each split image. The normal image structure may just have one file descriptor. Actually, maybe this whole thing is better called IMG_INFO instead of IO_INFO. In the imgtools collection, we could actually have a tool that converts the proprietary image formats to a raw image. >> Compression would be a major pain. Split, EnCase, and some of the >> RAID >> systems seem much easier. > Please see the patch i just submitted re compression. Very cool. I had never seen sgzip before. I guess it isn't as much of a pain as I thought :) > Do you have any idea how you would read in encase files? I didnt get > the > chance to ever use it so i dont know how complex the file format is > but there > is nothing i can find on the net re the format. Check out asrdata.com. Somewhere on there is a link to the expert witness format. I apologize if I am being a pain with some of these details, but after having to redesign autopsy because of a bad initial design, I want to make sure we add this new functionality the right way. thanks, brian |
From: Michael C. <mic...@ne...> - 2004-02-09 08:44:47
|
> Yea, but autopsy or flag need to store all of those options as well in > their own configuration file so that they can pass them to the fstools. > It would seem more flexible if the sleuth kit had a configuration file > for the image, which could then be used by any GUI (including autopsy, > flag, rex etc.) The problem with this approach is that the fstools are then too integrated with the GUI, especially if the configuration file becomes so complex that you really need a GUI to make one. In that case you cant just use them by themselve. Is that something we are prepared to live with? Or is a design goal to make small self contained tools that may be used from the command line? I agree that fstools by themselves are probably not all that useful without having some sort of GUI. So perhaps we just live with an increased level of complexity for the fstools, in favour of better integration into larger GUIs? The other problem that may arise from trying to make fstool integrate with the GUI's configuration files is that different GUIs store configuration in different ways, for example flag stores everything in the database (not even in a file), so having a single configuration file format is a little clunky. Its not so bad currently for flag, since we are currently using the database patch that was posted on the list a little while ago to dump out all the data from the image and we never really use the individual tools like ils,fls, icat etc. So it wont be too hard to simply write out a conf file for each image. However, im just thinking of the old version of flag where we did shell out to these tools basically for each file in the filesystem, the cost of parsing a huge config file for each invokation of icat would be tremendous i would imagine. > If we are going to start discussing configuration files for the fstools > (which we both agree are required for at least RAID), then I would > rather make then general enough so that they can be used for other > formats besides RAID. I would even like to have these include > the file system type, mounting point, and hashes of each partition. > Basically the stuff that the other tools include in some proprietary > format with the image, we would put in a separate text file. Thats a great idea (accepting the level of complexity from the fs tool is increased). I would vote for using an xml config file format, since its standard and easy to deal with and we dont have to write a parser. The downside is that we increase the program dependency by requiring libxml2 to be present. Alternatively we could write some yacc/lex parser but we than need to discuss a good format which will be sufficient for autopsy and allow future growth. > Sure. For this to work with the Sleuth Kit though, there must be the > ability to create the configurations in the sleuth kit. If the only > way to create the map is in flag, then that doesn't do autopsy any good > or future interfaces and it doesn't make sense to replicate the stuff > in each gui. Thats true. However, a raid map must be generated internally anyway in order to reassemble the individual raid implementations (e.g. lvm, linux raid, etc). Perhaps we can have different IO subsystems which all they do is generate a generic map and then call the generic raid implementation? So for example say we have a generic raid io subsystem as described above that takes on a raid map as input, then we have another subsystem called lvm for example which accepts a bunch of lvm specific parameters and then generates a raid map and calls the generic raid io subsystem. This way autopsy doesnt need to be able to build a generic map in the gui, but one will be built automatically as required. If the user works out a way to build a raid map by some other means (i.e. some other GUI, by hand, or whatever), they can still use the generic raid implementation. > I'm still not convinced that we need so many options on the command > line. The only case that I can see where all of the command line > options are beneficial is for a live analysis where you don't want to > write to the disk. But, in that case I don't see why you would need to > use any of these complex image formats because you will have access to > the raw device corresponding to the partition. Thats true, and if you have access to the raw device you would not need extra options or more complex io subsystems. > Is there a specific reason with flag that command line options are > easier? No reason currently, because we have our own program (dbtool) written using the fstools library (as is seen in the patch dave submitted). Im just thinking about the way it used to work by shelling out. Maybe a better way is to simply document the fstools library and define a clear interface (with a proper shared library), and then people would be expected to use the library rather than shell out to the tools all the time. > > Maybe it would make more sense to populate the IO_INFO structure > > inside the > > FS_INFO structure? > > I would rather not. I would prefer to keep the file system code > separate from the image format code. In fact, I would even consider > making all of this image stuff its own library, > imgtools maybe. It seems much more logical to call the file system > processing code with the filled in IO_INFO structure and let it read > from it. The file system code would never touch any of the file > descriptors, it would just call the read functions. This also allows > the 'mm...' tools to use the image formats and any other future tools, > such as memory images that are split or saved in another tool's > proprietary format. Just to clarify what you are saying... Are you proposing to make the io_subsystem and file system code into seperate libraries, and then the individual tools (e.g. fls) would open the subsystem, and initialise it, and then call the file system code giving it a filled in IO_INFO structure? If I understood your comment right it sounds great. So the IO_INFO structure will contain function pointers to the read_random and read_block which will be initialised by the constructor, and the fs code would just call those methods? Sounds great: FS_INFO * ext2fs_open(const char *name, unsigned char ftype) Changes to: FS_INFO * ext2fs_open(IO_INFO *io) (BTW do you think that ftype is a little redundant here? and a little off topic, it would be nicer if the *fs_open routines returned NULL if they couldnt find the filesystem rather than error out, cause then you could cycle over all filesystem decoders until one worked rather than demanding the user specify the -f parameter all the time. You could use -f to override the automatic detection) > I would lean towards the way that FS_INFO is structured. There would > be a few basic items in IO_INFO, such as the function pointers and > maybe the maximum size of the image. Then there are image specific > structures that have their needed values. For example, the structure > for split images may have an array of file descriptors and a structure > with the sizes of each split image. The normal image structure may just > have one file descriptor. Actually, maybe this whole thing is better > called IMG_INFO instead of IO_INFO. That sounds great we could cast a void* to achieve this, and then each io subsystem makes it own pointer and casts to void*: strcut IMG_INFO { common fields .... common fields function pointers.... void *data; } and maybe the multipart reassmebly has: struct part { char *filename; struct part* next; } So we initialise as: IMG_INFO *img; img->data=(void *)part_list While the raid is totaly different: struct raid { whatever, ... more stuff, } The advatage of this option is that the IMG_INFO struct doesnt need to know about each subsystem. > In the imgtools collection, we could actually have a tool that > converts the proprietary image formats to a raw image. That could be a new stand alone tool which chooses the right io-subsystem and dumps a dd image out. It would be useful in the case of raid. > Very cool. I had never seen sgzip before. I guess it isn't as much of > a pain as I thought :) It only took a day or so to write sgzip for this purpose, and I thought it would be useful in general for any application needing quick seeking in a compressed file. The library is now available in general on sf: http://sourceforge.net/project/showfiles.php?group_id=100803 > > Do you have any idea how you would read in encase files? I didnt get > > the > > chance to ever use it so i dont know how complex the file format is > > but there > > is nothing i can find on the net re the format. > > Check out asrdata.com. Somewhere on there is a link to the expert > witness format. Thanks for that, the format looks remarkably similar to sgzip except with some extra meta data stuck in there. Should be easy to write a library to access this. I just need to get a small example encase image to play with. > I apologize if I am being a pain with some of these details, but after > having to redesign autopsy because of a bad initial design, I want to > make sure we add this new functionality the right way. I think the discussion is very constructive so far. I was initially expecting a small change, but it looks like there is a need now to do a larger reorganization of code. Its going to pay off in the long run I expect. cheers Michael. |
From: Brian C. <ca...@sl...> - 2004-02-10 06:33:39
|
On Feb 9, 2004, at 3:44 AM, Michael Cohen wrote: >> Yea, but autopsy or flag need to store all of those options as well in >> their own configuration file so that they can pass them to the >> fstools. >> It would seem more flexible if the sleuth kit had a configuration >> file >> for the image, which could then be used by any GUI (including autopsy, >> flag, rex etc.) > The problem with this approach is that the fstools are then too > integrated > with the GUI, especially if the configuration file becomes so complex > that > you really need a GUI to make one. In that case you cant just use them > by > themselve. Is that something we are prepared to live with? Or is a > design > goal to make small self contained tools that may be used from the > command > line? I guess that depends on how complex the config file is. RAID systems are obviously the most complex Is there a way that we can make the config file fairly simple and have the maps built into the sleuth kit instead of the GUI? That will be faster to load as well if it just has to read a few parameters from the config file instead of a full mapping. > >> Sure. For this to work with the Sleuth Kit though, there must be the >> ability to create the configurations in the sleuth kit. If the only >> way to create the map is in flag, then that doesn't do autopsy any >> good >> or future interfaces and it doesn't make sense to replicate the stuff >> in each gui. > > Thats true. However, a raid map must be generated internally anyway in > order > to reassemble the individual raid implementations (e.g. lvm, linux > raid, > etc). Perhaps we can have different IO subsystems which all they do is > generate a generic map and then call the generic raid implementation? > So for > example say we have a generic raid io subsystem as described above > that takes > on a raid map as input, then we have another subsystem called lvm for > example > which accepts a bunch of lvm specific parameters and then generates a > raid > map and calls the generic raid io subsystem. This way autopsy doesnt > need to > be able to build a generic map in the gui, but one will be built > automatically as required. If the user works out a way to build a raid > map by > some other means (i.e. some other GUI, by hand, or whatever), they can > still > use the generic raid implementation. I would rather let each of the RAID types have their own data structures and read functions. That seems much more simple, scalable and efficient to run. It seems that making a map for every type of RAID system is like trying to make a mapping for all file systems so that we can use generic code for processing. Maybe we can have a type for generic RAID, but for Windows LDM or the Linux RAID configurations where the data structures and layout are known I would rather have standard code. That is much easier to audit as well. If someone wants to review what is going on, having to audit a raid map would be a major pain. > >>> Maybe it would make more sense to populate the IO_INFO structure >>> inside the >>> FS_INFO structure? >> >> I would rather not. I would prefer to keep the file system code >> separate from the image format code. In fact, I would even consider >> making all of this image stuff its own library, >> imgtools maybe. It seems much more logical to call the file system >> processing code with the filled in IO_INFO structure and let it read >> from it. The file system code would never touch any of the file >> descriptors, it would just call the read functions. This also allows >> the 'mm...' tools to use the image formats and any other future tools, >> such as memory images that are split or saved in another tool's >> proprietary format. > > Just to clarify what you are saying... Are you proposing to make the > io_subsystem and file system code into seperate libraries, and then the > individual tools (e.g. fls) would open the subsystem, and initialise > it, and > then call the file system code giving it a filled in IO_INFO > structure? If I > understood your comment right it sounds great. > > So the IO_INFO structure will contain function pointers to the > read_random and > read_block which will be initialised by the constructor, and the fs > code > would just call those methods? Sounds great: Yea. That seems to scale the best. > FS_INFO * > ext2fs_open(const char *name, unsigned char ftype) > > Changes to: > FS_INFO * > ext2fs_open(IO_INFO *io) > > (BTW do you think that ftype is a little redundant here? Yes and no. For most cases it is, but for FAT where the user can force it to be FAT12, FAT16 or FAT32, then it is needed. If we do auto detection though, then we can probably scrap it. > and a little off > topic, it would be nicer if the *fs_open routines returned NULL if they > couldnt find the filesystem rather than error out, cause then you > could cycle > over all filesystem decoders until one worked rather than demanding > the user > specify the -f parameter all the time. You could use -f to override the > automatic detection) Yea, that is a good idea. These types of changes are the ones that I want to examine when The Sleuth Kit gets a make over in the next few months. Auto detection of file systems would be very nice. > >> I would lean towards the way that FS_INFO is structured. There would >> be a few basic items in IO_INFO, such as the function pointers and >> maybe the maximum size of the image. Then there are image specific >> structures that have their needed values. For example, the structure >> for split images may have an array of file descriptors and a >> structure >> with the sizes of each split image. The normal image structure may >> just >> have one file descriptor. Actually, maybe this whole thing is better >> called IMG_INFO instead of IO_INFO. > > That sounds great we could cast a void* to achieve this, and then each > io > subsystem makes it own pointer and casts to void*: > strcut IMG_INFO { > common fields > .... > common fields > function pointers.... > void *data; > } To keep the fstools and imgtools code consistent we should either change the fstools structures to use the void *, or we can just use the method that they use. The file system structures, NTFS_INFO for example, are defined like: struct NTFS_INFO { FS_INFO fs_info; int blah; int boo; } The code casts the pointers as both FS_INFO and NTFS_INFO. So, the argument to an API function takes the FS_INFO structure, but the ntfs code just casts it to an NTFS_INFO. Both do the job, but I would like to be consistent. > and maybe the multipart reassmebly has: > struct part { > char *filename; > struct part* next; > } Yea, and we can include a file descriptor that is only opened when that file is needed and isn't closed until the end. We'll also need a size field in there somehow as well, but we can figure out the details later. thanks, brian |
From: Michael C. <mic...@ne...> - 2004-02-19 10:15:25
Attachments:
sleuthkit_subsystems.patch.gz
|
Hi Brian, I have started implementing the changes according to your suggestions, a= nd=20 Its coming along great. I quite like the method of extending the basic=20 IO_INFO struct with more specific structs and then casting them back and=20 forth. The code has a great OO feeling about it. Thanks for the pointers. Attached is an incomplete patch just to check that im on the right track= =2E=20 Only the fls tool is working with the new system currently, although it=20 should compile ok. Here is a summary of the changes: 1) fls uses the -i commandline paramter to choose the subsystem to use, and= =20 then gets an IO_INFO object by doing an: io=3Dio_open(io_subsys); If no subsystem is specified, it uses "standard" which is the current=20 default. 2) options are parsed into the io object by calling: =09 io_parse_options(io,argv[optind++]); (options are option=3Dvalue format or else if they do not contain '=3D', w= e take=20 it as a filename). 3) Once we parse all of the options, we create an fs object using the io=20 object: fs =3D fs_open(io, fstype); This will initialise a fs->io paramter with the io subsystem object.=20 4) All filesystem code uses the io object to actually do the reading. e.g.: ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *) ext2fs,(ch= ar=20 *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); the io object has a number of methods. For example: io->constructor io->read_random io->read_block This has a cool object oriented feel about it. 5) There is an array in fs_io.c which acts like a class and is initialised = to=20 produce new objects of all IO_INFO derived types. (i.e. all new io objects= =20 are basically copies of this struct with extra stuff appended) Eg. static IO_INFO subsystems[] =3D{ { "standard","Standard Sleuthkit IO Subsystem", sizeof(IO_INFO_STD),=20 &io_constructor, &free, &std_help, &std_initialiser, &std_read_block, =20 &std_read_random, &std_open, &std_close}, Note that IO_INFO_STD is a derived object of IO_INFO: struct IO_INFO_STD { IO_INFO io; char *name; int fd; }; i.e. it has the same methods like IO_INFO, but extra attributes. All the=20 other subsystems use this method to add extra attributes to the basic IO_IN= =46O=20 pointer which is carried around the place as an IO_INFO (cast from=20 IO_INFO_ADV etc). =46rom a user perspective we can now do this: =2D Read in a simple dd partition (compatibility with old fls): fls -r -f linux-ext2 honeypot.hda5.dd =2D Read in a partition file from a hdd dd image. (i.e. use an offset): fls -r -i advanced offset=3D100 honeypot.hda5.dd =2DRead in a split dd file (after splitting with split): fls -r -i advanced /tmp/xa* Or (same thing just to emphesize the fact that multiple files can be specif= ied=20 on the same command line): fls -r -i advanced /tmp/xaa /tmp/xab /tmp/xac /tmp/xad =2DRead in an sgziped file: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz =2D List files in directory inode 62446: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz 62446 Whats left to do: As I mentioned, this is only an intermediate patch to check that im on the= =20 right track. These are the things that need to be finished: 1) Add a config file parser option to allow options to be passed from a con= fig=20 file, ie: fls -r -c config -i advanced honeypot.hda5.dd where config is just a files with lines like: offset=3D1024 2) Change offset to be in blocks (and add a blocks keyword to override the = fs=20 default). 3) Update all the other tools other than fls to support the new syntax. 4) I also have been working on an expert witness subsystem. Expert witness = is=20 the format used by encase, ftk etc. I have a filter working atm to convert= =20 these files to straight dd, but i want to implement a subsystem so we can=20 work on these files directly in sleuthkit. This is coming very soon, maybe= =20 this weekend. This will obviously require lots of tender loving testing=20 because i only have encase ver 3 to play with. Please let me know what you think about this. Also let me know if there are= =20 other things that need to be completed still regards this patch. Michael. |
From: Brian C. <ca...@sl...> - 2004-02-22 23:28:18
|
Excellent. > 4) All filesystem code uses the io object to actually do the reading. > e.g.: > ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *) > ext2fs,(char > *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); > > the io object has a number of methods. For example: > io->constructor What is the constructor used for? I quickly looked the changes over. Is there a need to pass FS_INFO to the read functions? I didn't see you using them and it would be nice if we could avoid doing that (so that we don't have to restrict ourselves to file system code). thanks, brian |
From: Michael C. <mic...@ne...> - 2004-02-25 14:08:45
|
Hi Brian, > I quickly looked the changes over. Is there a need to pass FS_INFO to > the read functions? I didn't see you using them and it would be nice > if we could avoid doing that (so that we don't have to restrict > ourselves to file system code). Good point, this is now fixed. read_blocks moved into the fs struct and the io_info struct is now free from all references to fs_info. The major change now is that I implemented exception handling throughout all the libraries used by the io subsystem. I did not touch the rest of fstools for the moment, just until i get the general opinion from the list about this move, but in the long term I believe that all code should be converted to use exceptions, rather than calling error(). Exceptions make the code cleaner and easier to follow, and also make it possible to use this code as part of bigger pieces of code (e.g. as a python or perl module). I really need the io subsystem to be available as a python module for flag so I need good exception support in the iosubsystem code. The swig interface has also been tuned to support exceptions in an intelligent manner and pass C exceptions to python exceptions seamlessly. Here is an example python session: Python 2.3.3 (#2, Jan 13 2004, 00:47:05) [GCC 3.3.3 20040110 (prerelease) (Debian)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import iosubsys >>> io=iosubsys.io_open("ewf") >>> iosubsys.parse_options(io,"off=32") Traceback (most recent call last): File "<stdin>", line 1, in ? RuntimeError: option off not recognised >>> iosubsys.parse_options(io,"/storage/tmp/1.e01") 0 >>> iosubsys.read_random(io,100,0) (100, '.... data chopped for briefness... ') >>> iosubsys.read_random(io,100,10000000) Traceback (most recent call last): File "<stdin>", line 1, in ? IOError: Attempting to seek past the end of the file! >>> Here we see a python session using the iosubsys module, with an encase file of a floppy disk. When we try to set an option that is not supported (off=32), we immediately get a python runtime exception. This exception can be caught and an intelligent course of action can be taken by the python environment. Similarly we see an IOError exception raised when trying to read past the end of file. Just out of interest, this is the C code that actually raises this exception: (line 377 in libevf.c): if(chunk>offsets->max_chunk) { RAISE(E_IOERROR,NULL,"Attempting to seek past the end of the file!"); }; All the underlying libraries support exceptions. At this stage I did not change any of the fstools themselves to handle the exceptions, so when a similar error occurs using the fls tool for example, its unhandled and causes the program to exit; pretty much the same behaviour as previously: bash$ ../bin/fls -f fat -i ewf /etc/passwd Unhandled Exception(IO Error): File format not recognised as EWF Again the exception was raised from deep inside the library, but since it was not caught by anything it caused an unhandled exception termination. Thoughts anyone? Michael |
From: Brian C. <ca...@sl...> - 2004-02-25 19:02:37
|
> The major change now is that I implemented exception handling > throughout all > the libraries used by the io subsystem. I did not touch the rest of > fstools > for the moment, just until i get the general opinion from the list > about this > move, but in the long term I believe that all code should be converted > to use > exceptions, rather than calling error(). I agree. The error() functionality is a TCT legacy and I have no issues with adding exceptions. My time has been extremely limited for working on the tools, but here are my goals for the next few months. - I have most of Autopsy changed for v2 and would like to get that out in the next couple of weeks. - Start planning what v2 for The Sleuth Kit will look like. So far we have the following potential additions: - New image format support and corresponding library - indexing support - Re-examine output data of tools (such as ils) - Add exception handling Anything else I have forgotten about? brian |