[sleuthkit-developers] Re: IO Subsystem patch for fstools

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Wed, 4 Feb 2004 03:34 am, Brian Carrier wrote:
> Wow!  This looks great!

Thanks, Brian.

> My original plan was to use the '-o' flag to specify the sector offset
> for the file system.  I figured sectors would be easier than bytes
> because mmls and fdisk give you the values in sectors and almost every
> disk uses a 512-byte sector.  This also allows people to use the offset
> value without the '-i' setting.

Great idea. Sectors would be much more useful than straight bytes. The idea is 
that each subsystem may choose to implement its logical-physical mapping 
however makes sense for it. And therefore would need different parameters 
most conveniently denoted by name. So rather  than waste a whole option -o on 
just an offset, maybe we could use -o to specify a number of subsystem 
dependant options.

If we want people to be able to use -o without needing to worry about using 
-i, thats easily solved. If you dont use -i, the default sk subsystem is 
used, and it can simply take a single option, being offset. so users can just 
use -o to implement a simple offset. 

> I like the idea of the '-i' because it is like specifying the image
> type, whereas -f is specifying the file system type. I hadn't thought
> about getting this advanced, but it looks good.
>
> I would actually say that '-i' should only have the type and no other
> options.  If multiple files are needed (splitting and RAID), then they
> should be appended to the end of the command.  For example, to look at
> the file system at offset sector 12345, the following could be used
> (names are made up):
>
> Normal full image:
> fls -f linux-ext3 -o 12345 file1.dd
> or
> fls -f linux-ext3 -i single -o 12345 file1.dd
>
> Split Image:
> fls -f linux-ext3 -i split -o 12345 file1.dd file2.dd
>
> LVM RAID Image:
> fls -f linux-ext3 -i lvm -o 12345 lvm-config.dat
>
> MS LDM Spanning  Image
> fls -f ntfs -i ldm-span -o 12345 ldm-config.dat

That is indeed a good suggestion. It needs more careful manipulation of the 
getopts in the client program but it should work. The only trouble is that 
the parameters to the subsystem can be arbitrary- subsystem specific ones, so 
for example maybe for split image:

fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file2.dd

and just in case you wanted to have a file called offset or blocksize, you 
could use a qualifier called file= in front of it like:

fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd file=offset

but without a qualifier, its just interpreted as a filename. Similarly for the 
truely lazy user if the subsystem specific option parser sees an option 
consisting just a number, it takes that as the offset, then you dont need to 
qulify offset by using a keywork.

> It would also be useful if the config file format that you are
> developing for the RAID images could be used for the split images.

It can, but the algorithm for the raid reconstruction is more complex, and 
performance would suffer if the same subsystem was used all around. The 
format (not finalised yet...) is something like:

paremeter=...
parameter=...

slot number,disk number
slot number,disk number

one per line. A slot is the logical position within the raid period where the 
block should be taken from. example:

1,1
2,1
1,2
2,2

specifies that the first block is taken from slot 1, disk1, the next from slot 
2, disk 1, the next from slot 1, disk 2 and slot 2,disk 2. so if we starts 
the raid period at block 0, slot 1 corresponds to block 0, and slot 2 to 
block 1. The next blocks requested starts a whole new period which the slots 
into a new set of absolute offsets, namely slot 1 is now block 2 and slot 2 
is block 3... etc etc...

So this scheme does use offsets to start reading the disks, and block sizes so 
i guess if you really wanted, you could make a raid map correspond to a 
number of split disks, but not easily, especially if the disks have different 
sizes.

I guess the file may not be that human readable, because we use flag to 
generate it automatically. I really didnt want to have to use more advanced 
lex/yacc for this. What do you think?

> To keep the subsystem design similar to what currently exists, have you
> thought about the following:
>
> A new data structure IO_INFO and before fs_open is run, the io_open()
> function is run with either the image lists or the config file etc and
> the offset.  There would probably have to be one for io_open_files(char
> **) and io_open_config(char *).
>
> The IO_INFO structure is filed in with io_open and the needed read
> functions are mapped (like file_walk etc are now in FS_INFO).
>
> The fs_open() function gets the IO_INFO structure passed to it and the
> fs_open() no longer needs to do the open() system call on the images.
> It just checks the magic value and fills in FS_INFO.   Any
> read_random() function in the file system code turns into
> fs_info->io->read_random(...).

This is an alternative design - the advantage with your method is that you 
could potentially have a number of different subsystems in use at the same 
time in the same program, while my subsystem design keeps subsystem data as 
static so its program wide. I just didnt really want to change all the 
read_random functions throughout the code (it would mean bigger changes in 
the architecture because almost every file will be touched many times.).

I still think that it would be more useful to allow each subsystem to manage 
its own options, rather than trying to second guess all the options in 
advance and stick them into the io_info struct. So for example rather than 
have the io_info struct have one entry for io_open_files(char **) and 
io_open_config(char *), maybe we can just have an entry for void *data, and a 
single io_open(void *data), and allow the subsystem to set that to whatever 
configuration parameters make sense for it - the single file option might 
attach a char * in the data pointer, while the multifile stuff might attach a 
char **. The raid subsystem might attach a preparse linked list of its raid 
map so it can work off that. whatever makes sense.

A couple of more types of IO subsystem i just thought of are an encase file 
format subsystem (allows you to read standard encase files with sk) and a 
compressed file subsystem (allows to work directly off compressed files). I 
have no idea how difficult it would be to actually implement those, but they 
look promising.

Michael.