[sleuthkit-developers] Re: IO Subsystem patch for fstools

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>> My original plan was to use the '-o' flag to specify the sector offset
>> for the file system.  I figured sectors would be easier than bytes
>> because mmls and fdisk give you the values in sectors and almost every
>> disk uses a 512-byte sector.  This also allows people to use the 
>> offset
>> value without the '-i' setting.
>
> Great idea. Sectors would be much more useful than straight bytes. The 
> idea is
> that each subsystem may choose to implement its logical-physical 
> mapping
> however makes sense for it. And therefore would need different 
> parameters
> most conveniently denoted by name. So rather  than waste a whole 
> option -o on
> just an offset, maybe we could use -o to specify a number of subsystem
> dependant options.

Ok.  I was under the impression that you wanted to have a configuration 
file for any of the more complex subsystems and therefore the options 
could be specified there.  The offset is the only variable in the 
process that may change between executions (i.e. accessing a different 
partition) and doesn't make sense to be in a config file.  Unless the 
config file allowed you to assign names to offsets.  For example, 
assign the name 'part1' to sector offset 63 and then you could use 'fls 
-f ntfs -o part1 image.dd'.

>>
>> I would actually say that '-i' should only have the type and no other
>> options.  If multiple files are needed (splitting and RAID), then they
>> should be appended to the end of the command.  For example, to look at
>> the file system at offset sector 12345, the following could be used
>> (names are made up):
>
> That is indeed a good suggestion. It needs more careful manipulation 
> of the
> getopts in the client program but it should work. The only trouble is 
> that
> the parameters to the subsystem can be arbitrary- subsystem specific 
> ones, so
> for example maybe for split image:
>
> fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd 
> file2.dd
>
> and just in case you wanted to have a file called offset or blocksize, 
> you
> could use a qualifier called file= in front of it like:
>
> fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd 
> file=offset
>
> but without a qualifier, its just interpreted as a filename. Similarly 
> for the
> truely lazy user if the subsystem specific option parser sees an option
> consisting just a number, it takes that as the offset, then you dont 
> need to
> qulify offset by using a keywork.

That is why I was assuming that a configuration file would be used for 
complex situations.  What does the blocksize value do for a split 
image?  It seems that only the RAID / VM configurations need complex 
options.  The split mode (or EnCase if that happens in the future) can 
be done w/out options.

I would rather force complex configurations to configuration files.  
The command line options for the sleuth kit are already too numerous 
and it will make using Autopsy easier if the config file can be 
referenced instead of having to load up the command line every time.

>
>> It would also be useful if the config file format that you are
>> developing for the RAID images could be used for the split images.
>
> It can, but the algorithm for the raid reconstruction is more complex, 
> and
> performance would suffer if the same subsystem was used all around. The
> format (not finalised yet...) is something like:

Oh.  I was thinking that the configuration file would have an entry 
that identified which IO subsystem to use.  For example, a line that 
says:

image_format = "split"
or
image_format = "lvm-splice"

> one per line. A slot is the logical position within the raid period 
> where the
> block should be taken from. example:
>
> 1,1
> 2,1
> 1,2
> 2,2

[....]

> I guess the file may not be that human readable, because we use flag to
> generate it automatically. I really didnt want to have to use more 
> advanced
> lex/yacc for this. What do you think?

Oh ok.  I think that it will be very hard to create such a 
configuration file.  To create the file, you will need to know which VM 
/ RAID system is being used.  I think it would be much easier to have a 
subsystem for each VM / RAID type and then the only thing that needs to 
be specified in the configuration file is the options for that type.  
For example, if the Linux LVM were used, then you may need to only 
specify the disk ordering and the block size.  When reading from the 
image, the lvm-split-read() function would be used.

>> To keep the subsystem design similar to what currently exists, have 
>> you
>> thought about the following:
>>
>> A new data structure IO_INFO and before fs_open is run, the io_open()
>> function is run with either the image lists or the config file etc and
>> the offset.  There would probably have to be one for 
>> io_open_files(char
>> **) and io_open_config(char *).
>>
>> The IO_INFO structure is filed in with io_open and the needed read
>> functions are mapped (like file_walk etc are now in FS_INFO).
>>
>> The fs_open() function gets the IO_INFO structure passed to it and the
>> fs_open() no longer needs to do the open() system call on the images.
>> It just checks the magic value and fills in FS_INFO.   Any
>> read_random() function in the file system code turns into
>> fs_info->io->read_random(...).
>
> This is an alternative design - the advantage with your method is that 
> you
> could potentially have a number of different subsystems in use at the 
> same
> time in the same program, while my subsystem design keeps subsystem 
> data as
> static so its program wide. I just didnt really want to change all the
> read_random functions throughout the code (it would mean bigger 
> changes in
> the architecture because almost every file will be touched many 
> times.).

I have no problems changing all of the files.  If we are going to add 
this functionality, I would rather do it right the first time.

> I still think that it would be more useful to allow each subsystem to 
> manage
> its own options, rather than trying to second guess all the options in
> advance and stick them into the io_info struct. So for example rather 
> than
> have the io_info struct have one entry for io_open_files(char **) and
> io_open_config(char *), maybe we can just have an entry for void 
> *data, and a
> single io_open(void *data), and allow the subsystem to set that to 
> whatever
> configuration parameters make sense for it - the single file option 
> might
> attach a char * in the data pointer, while the multifile stuff might 
> attach a
> char **. The raid subsystem might attach a preparse linked list of its 
> raid
> map so it can work off that. whatever makes sense.

Actually, I guess we just need one io_open() function because fls.c and 
similar files will not know if the file is a config file or an image 
file.  io_open would have a char ** to list the  image files or config 
file, a type field for the type of image format, and an offset value.  
It would then fill in the IO_INFO structure and return it, which would 
be passed to fs_open().

> A couple of more types of IO subsystem i just thought of are an encase 
> file
> format subsystem (allows you to read standard encase files with sk) 
> and a
> compressed file subsystem (allows to work directly off compressed 
> files). I
> have no idea how difficult it would be to actually implement those, 
> but they
> look promising.

Compression would be a major pain.  Split, EnCase, and some of the RAID 
systems seem much easier.

thanks,
brian