[sleuthkit-developers] Re: IO Subsystem patch for fstools

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> Yea, but autopsy or flag need to store all of those options as well in
> their own configuration file so that they can pass them to the fstools.
>   It would seem more flexible if the sleuth kit had a configuration file
> for the image, which could then be used by any GUI (including autopsy,
> flag, rex etc.)
The problem with this approach is that the fstools are then too integrated 
with the GUI, especially if the configuration file becomes so complex that 
you really need a GUI to make one. In that case you cant just use them by 
themselve. Is that something we are prepared to live with? Or is a design 
goal to make small self contained tools that may be used from the command 
line?

I agree that fstools by themselves are probably not all that useful without 
having some sort of GUI. So perhaps we just live with an increased level of 
complexity for the fstools, in favour of better integration into larger GUIs? 
The other problem that may arise from trying to make fstool integrate with 
the GUI's configuration files is that different GUIs store configuration in 
different ways, for example flag stores everything in the database (not even 
in a file), so having a single configuration file format is a little clunky.

Its not so bad currently for flag, since we are currently using the database 
patch that was posted on the list a little while ago to dump out all the data 
from the image and we never really use the individual tools like ils,fls, 
icat etc. So it wont be too hard to simply write out a conf file for each 
image. However, im just thinking of the old version of flag where we did 
shell out to these tools basically for each file in the filesystem, the cost 
of parsing a huge config file for each invokation of icat would be tremendous 
i would imagine.

> If we are going to start discussing configuration files for the fstools
> (which we both agree are required for at least RAID), then I would
> rather make then general enough so that they can be used for other
> formats besides RAID.  I would even like to have these include
> the file system type, mounting point, and hashes of each partition.
> Basically the stuff that the other tools include in some proprietary
> format with the image, we would put in a separate text file.

Thats a great idea (accepting the level of complexity from the fs tool is 
increased).  I would vote for using an xml config file format, since its 
standard and easy to deal with and we dont have to write a parser. The 
downside is that we increase the program dependency by requiring libxml2 to 
be present. Alternatively we could write some yacc/lex parser but we than 
need to discuss a good format which will be sufficient for autopsy and allow 
future growth.

> Sure.  For this to work with the Sleuth Kit though, there must be the
> ability to create the configurations in the sleuth kit.   If the only
> way to create the map is in flag, then that doesn't do autopsy any good
> or future interfaces and it doesn't make sense to replicate the stuff
> in each gui.

Thats true. However, a raid map must be generated internally anyway in order 
to reassemble the individual raid implementations (e.g. lvm, linux raid, 
etc).  Perhaps we can have different IO subsystems which all they do is 
generate a generic map and then call the generic raid implementation? So for 
example say we have a generic raid io subsystem as described above that takes 
on a raid map as input, then we have another subsystem called lvm for example 
which accepts a bunch of lvm specific parameters and then generates a raid 
map and calls the generic raid io subsystem. This way autopsy doesnt need to 
be able to build a generic map in the gui, but one will be built 
automatically as required. If the user works out a way to build a raid map by 
some other means (i.e. some other GUI, by hand, or whatever), they can still 
use the generic raid implementation.

> I'm still not convinced that we need so many options on the command
> line.  The only case that I can see where all of the command line
> options are beneficial is for a live analysis where you don't want to
> write to the disk.  But, in that case I don't see why you would need to
> use any of these complex image formats because you will have access to
> the raw device corresponding to the partition.

Thats true, and if you have access to the raw device you would not need extra 
options or more complex io subsystems.

> Is there a specific reason with flag that command line options are
> easier?

No reason currently, because we have our own program (dbtool) written using 
the fstools library (as is seen in the patch dave submitted). Im just 
thinking about the way it used to work by shelling out. Maybe a better way is 
to simply document the fstools library and define a clear interface (with a 
proper shared library), and then people would be expected to use the library 
rather than shell out to the tools all the time.

> > Maybe it would make more sense to populate the IO_INFO structure
> > inside the
> > FS_INFO structure?
>
> I would rather not.  I would prefer to keep the file system code
> separate from the image format code.  In fact, I would even consider
> making all of this image stuff its own library,
> imgtools maybe.  It seems much more logical to call the file system
> processing code with the filled in IO_INFO structure and let it read
> from it.  The file system code would never touch any of the file
> descriptors, it would just call the read functions.  This also allows
> the 'mm...' tools to use the image formats and any other future tools,
> such as memory images that are split or saved in another tool's
> proprietary format.

Just to clarify what you are saying... Are you proposing to make the 
io_subsystem and file system code into seperate libraries, and then the 
individual tools (e.g. fls) would open the subsystem, and initialise it, and 
then call the file system code giving it a filled in IO_INFO structure? If I 
understood your comment right it sounds great.

So the IO_INFO structure will contain function pointers to the read_random and 
read_block which will be initialised by the constructor, and the fs code 
would just call those methods? Sounds great:

FS_INFO *
ext2fs_open(const char *name, unsigned char ftype)

Changes to:
FS_INFO *
ext2fs_open(IO_INFO *io)

(BTW do you think that ftype is a little redundant here? and a little off 
topic, it would be nicer if the *fs_open routines returned NULL if they 
couldnt find the filesystem rather than error out, cause then you could cycle 
over all filesystem decoders until one worked rather than demanding the user 
specify the -f parameter all the time. You could use -f to override the 
automatic detection)

> I would lean towards the way that FS_INFO is structured.  There would
> be a few basic items in IO_INFO, such as the function pointers and
> maybe the maximum size of the image.  Then there are image specific
> structures that have their needed values.  For example, the structure
> for split images may have an array of file descriptors and  a structure
> with the sizes of each split image. The normal image structure may just
> have one file descriptor.  Actually, maybe this whole thing is better
> called IMG_INFO instead of IO_INFO.

That sounds great we could cast a void* to achieve this, and then each io 
subsystem makes it own pointer and casts to void*:
strcut IMG_INFO {
     common fields
     ....
     common fields
     function pointers....
     void *data;
}

and maybe the multipart reassmebly has:
struct part {
	char *filename;
	struct part* next;
}

So we initialise as:
   IMG_INFO *img;
   img->data=(void *)part_list

While the raid is totaly different:
struct raid {
	whatever,
	... more stuff,
}

The advatage of this option is that the IMG_INFO struct doesnt need to know 
about each subsystem.

> In the imgtools collection, we could actually  have a tool that
> converts the proprietary image formats to a raw image.

That could be a new stand alone tool which chooses the right io-subsystem and 
dumps a dd image out. It would be useful in the case of raid.

> Very cool. I had never seen sgzip before.  I guess it isn't as much of
> a pain as I thought :)

It only took a day or so to write sgzip for this purpose, and I thought it 
would be useful in general for any application needing quick seeking in a 
compressed file. The library is now available in general on sf:

http://sourceforge.net/project/showfiles.php?group_id=100803

> > Do you have any idea how you would read in encase files? I didnt get
> > the
> > chance to ever use it so i dont know how complex the file format is
> > but there
> > is nothing i can find on the net re the format.
>
> Check out asrdata.com.  Somewhere on there is a link to the expert
> witness format.

Thanks for that, the format looks remarkably similar to sgzip except with some 
extra meta data stuck in there. Should be easy to write a library to access 
this. I just need to get a small example encase image to play with.

> I apologize if I am being a pain with some of these details, but after
> having to redesign autopsy because of a bad initial design, I want to
> make sure we add this new functionality the right way.

I think the discussion is very constructive so far. I was initially expecting 
a small change, but it looks like there is a need now to do a larger 
reorganization of code. Its going to pay off in the long run I expect.

cheers
Michael.