sleuthkit-developers Mailing List for The Sleuth Kit (Page 40)
Brought to you by:
carrier
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(39) |
Mar
(8) |
Apr
(17) |
May
(10) |
Jun
(2) |
Jul
(6) |
Aug
(4) |
Sep
(1) |
Oct
(3) |
Nov
|
Dec
|
2005 |
Jan
(2) |
Feb
(6) |
Mar
(2) |
Apr
(2) |
May
(13) |
Jun
(2) |
Jul
|
Aug
|
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
(9) |
Dec
(4) |
2007 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
(2) |
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(9) |
Jul
(14) |
Aug
|
Sep
(5) |
Oct
(10) |
Nov
(4) |
Dec
(7) |
2009 |
Jan
(7) |
Feb
(10) |
Mar
(10) |
Apr
(19) |
May
(16) |
Jun
(3) |
Jul
(9) |
Aug
(5) |
Sep
(5) |
Oct
(16) |
Nov
(35) |
Dec
(30) |
2010 |
Jan
(4) |
Feb
(24) |
Mar
(25) |
Apr
(31) |
May
(11) |
Jun
(9) |
Jul
(11) |
Aug
(31) |
Sep
(11) |
Oct
(10) |
Nov
(15) |
Dec
(3) |
2011 |
Jan
(8) |
Feb
(17) |
Mar
(14) |
Apr
(2) |
May
(4) |
Jun
(4) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(8) |
Nov
(16) |
Dec
(1) |
2012 |
Jan
(9) |
Feb
(2) |
Mar
(3) |
Apr
(13) |
May
(10) |
Jun
(7) |
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(19) |
Dec
(3) |
2013 |
Jan
(16) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
|
Jun
(3) |
Jul
(2) |
Aug
(17) |
Sep
(6) |
Oct
(1) |
Nov
|
Dec
(4) |
2014 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(7) |
May
(6) |
Jun
(1) |
Jul
(18) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
(26) |
Dec
(7) |
2015 |
Jan
(5) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(7) |
Sep
(4) |
Oct
(1) |
Nov
(1) |
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
(13) |
Jul
(23) |
Aug
(2) |
Sep
(11) |
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
(4) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
(3) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(5) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Michael C. <mic...@ne...> - 2004-02-23 12:42:10
|
On Mon, 23 Feb 2004 10:12 pm, Paul Bakker wrote: > Hi Michael, Hi Paul, > I currently support two modes (As will the new release)... > > The first is raw mode, meaning that all data on the disk is indexed > as it is!.. That means that the whole disk is walked sequentially in > (currently) 64k blocks... This can be enlarged if that would increase > performance of the underlying subsystem.. It is possible for you then to just malloc a buffer (say 1mb) and fill it=20 sequentially, and then just index the buffer? Or will that complicate=20 matters? > The second is raw_fragment mode, meaning that all fragmented pieces > of files are indexed in a similar manner as icat runs through them... > I use both inode_walk and file_walk.. Thus this consists of more small > reads. Have a look at the dbtools patch, because David collett has done something= =20 similar for flag. In his file walk he is simply building a linked list of=20 blocks in the file (without actually reading these blocks), and then in his= =20 print_blocks he is saving entire runs of blocks as unique entries.=20 So for example suppose you have a file thats 30 blocks big (~120kb). While = the=20 file_walk might call the callback 30 times, (once for each block), In the e= nd=20 David's print_blocks function will print a single entry for 30 consecutive= =20 blocks. So the idea is that you might use this information to preallocate a buffer = 30=20 blocks big and read a large chunk into it- then index that buffer. The resu= lt=20 is that you need to do 30 times less reading on the actual file (in this=20 example).=20 =46rom experinece, most files are not really fragmented and at most I have = seen=20 large files fragmented into 2-3 parts. Thats an average of 2-3 reads for=20 large files, and a single read for small files - not too expensive at all.= =20 (contrast this with reading the block on each callback you will need to rea= d=20 every 4kb in every file, or upto several hundred times for each large file). I just ran icat under gdb again to confirm what im saying here. This is the= =20 result: (gdb) b icat_action (gdb) r -f linux-ext2 /var/tmp/honeypot.hda1.dd 16 > /tmp/vmlinuz Breakpoint 1, icat_action (fs=3D0x8064e18, addr=3D2080, buf=3D0x8065c48 ... , size=3D1024, flags=3D1028, ptr=3D0x805d934 "") The size is the size that icat writes in every call to icat_action, and it= =20 seems to be alway 1024 here (block size). So the icat_action callback is=20 called for every single block. The other problem I can think about (again, I havent seen your code yet so = im=20 sorry if im talking crap here), is that if you only do the indexing on each= =20 buffer returned by the file_walk callback, then wouldnt you be missing word= s=20 that happen to be cut by the block boundary? i.e. half a word will be on on= e=20 block and the other half on the next block? This problem will be alleviated= =20 to some extent by indexing larger buffers than blocksizes. Another suggestion... The way we currently have string indexing done in fla= g=20 (its mostly in cvs and not quite finished, but we should have it finished=20 soon :-) is by using David's dbtool to extract _all_ the meta information=20 about the image into the database - this includes all the inodes, files, an= d=20 the blocks these files occupy (in other words file_walk and inode_walk) - W= e=20 do not read the blocks themselves, we just note where they are. We then ind= ex=20 the entire image file storing all the offsets for the strings in the=20 database. Then its quite easy to tell if a string is in an allocated file,= =20 and exactly which file its in (and inode). We can extract entire files by=20 simply reading the complete block run (as i described above). The result is= =20 that we dont really seek very much, we seek a bit in the dbtool to pull out= =20 all the meta data, but then we just read sequentially for indexing - and ve= ry=20 large reads at that (I think about 1mb buffers). That said indexing is a tough job and it does take a long time... its=20 inescapable. Im interested in your indexing implementation, because the=20 database implementation requires a copy of all the strings to live in the d= b,=20 which basically blows the db up to about 1/3-1/2 the size of the=20 (uncompressed) image. This is not really practical. > As fragmented parts usually comprise only a very small amount of the > disk, this should not be used as an indication of access.. Especially > the first mode (raw) is a real time/disk access/processor roughy... In > it's current form it does not use any seeks, as this greatly increases > speed (Almost double if otherwise)... Caching can certainly help here. Although if you can restructure your code = so=20 that you do few big reads rather than lots of small reads it would aleviate= =20 the need for caching.=20 This is especially important when you think about the possibility of your c= ode=20 directly operating on encase evidence files or compressed volumes, where in= =20 that case the major cost is the decompression overheads. Remember that with= =20 encase the minimum buffer is 32kb, so even if you wanted to read 1 byte, it= =20 will still need to decompress the whole 32kb chunk to give you that byte -= =20 very expensive. In that case the cost of seeking is negligent relative to t= he=20 cost of decompression. Michael. |
From: Paul B. <ba...@fo...> - 2004-02-23 11:21:09
|
Hi Michael, > The solution to this problem, i think, is to implement some=20 > kind of caching in=20 > memory. A cache system can solve all those problems very=20 > efficiently, =20 > particularly for the case where you make lots of small reads,=20 > very close=20 > together (i.e. no seeks). A simple cache (with a simple=20 > policy) can be=20 > implemented quite easily i think, and will be effective for=20 > the scenario you=20 > are describing. >=20 > What kind of IO do you do for indexing? Is it very localised?=20 > If you were to=20 > cache a block into memory, what would be the optimal size of=20 > the block? (say=20 > 1 mb or more like 32kb?) If you were to cache 1 mb in memory,=20 > how many reads=20 > would you get out of it on average? I currently support two modes (As will the new release)... The first is raw mode, meaning that all data on the disk is indexed as it is!.. That means that the whole disk is walked sequentially in (currently) 64k blocks... This can be enlarged if that would increase performance of the underlying subsystem.. The second is raw_fragment mode, meaning that all fragmented pieces of files are indexed in a similar manner as icat runs through them... I use both inode_walk and file_walk.. Thus this consists of more small reads. As fragmented parts usually comprise only a very small amount of the disk, this should not be used as an indication of access.. Especially the first mode (raw) is a real time/disk access/processor roughy... In it's current form it does not use any seeks, as this greatly increases speed (Almost double if otherwise)... Paul Bakker |
From: Michael C. <mic...@ne...> - 2004-02-23 09:42:13
|
Hi Paul, > Well I do see advantages.... I already wanted to ask this... > > The problem with the current code is that it is not possible to > "read_random" an image efficiently because it cannot check the current > offset in the image.. This results in unnecessary seeks.. And seeks are > very expensive if they come in millions.... I agree - this is particularly bad if the underlying image is a compressed format like encase or sgzip because then each seek/read corresponds to a decompression of at least one block. > For Indexed Searching it would be very handy if their would come either: a > generic fs_read_random() function. > > If this function would check for the current offset in the image and thus > not seek if the reads where all in succession, whis would be great... This really depends on the specific subsystem, for example when reading an encase file you need to decompress at least one chunk for each seek so if you read lots of little runs of data all over the file its gonna run slow. The solution to this problem, i think, is to implement some kind of caching in memory. A cache system can solve all those problems very efficiently, particularly for the case where you make lots of small reads, very close together (i.e. no seeks). A simple cache (with a simple policy) can be implemented quite easily i think, and will be effective for the scenario you are describing. What kind of IO do you do for indexing? Is it very localised? If you were to cache a block into memory, what would be the optimal size of the block? (say 1 mb or more like 32kb?) If you were to cache 1 mb in memory, how many reads would you get out of it on average? Michael |
From: Paul B. <ba...@fo...> - 2004-02-23 08:11:55
|
> Excellent. >=20 > > 4) All filesystem code uses the io object to actually do=20 > the reading.=20 > > e.g.: > > ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *)=20 > > ext2fs,(char > > *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); > > > > the io object has a number of methods. For example: > > io->constructor >=20 > What is the constructor used for? >=20 > I quickly looked the changes over. Is there a need to pass=20 > FS_INFO to=20 > the read functions? I didn't see you using them and it would be nice=20 > if we could avoid doing that (so that we don't have to restrict=20 > ourselves to file system code). Well I do see advantages.... I already wanted to ask this... The problem with the current code is that it is not possible to=20 "read_random" an image efficiently because it cannot check the current offset in the image.. This results in unnecessary seeks.. And seeks are very expensive if they come in millions.... For Indexed Searching it would be very handy if their would come either: = a generic fs_read_random() function. If this function would check for the current offset in the image and = thus not seek if the reads where all in succession, whis would be great... Paul Bakker |
From: Brian C. <ca...@sl...> - 2004-02-22 23:28:18
|
Excellent. > 4) All filesystem code uses the io object to actually do the reading. > e.g.: > ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *) > ext2fs,(char > *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); > > the io object has a number of methods. For example: > io->constructor What is the constructor used for? I quickly looked the changes over. Is there a need to pass FS_INFO to the read functions? I didn't see you using them and it would be nice if we could avoid doing that (so that we don't have to restrict ourselves to file system code). thanks, brian |
From: Paul B. <ba...@fo...> - 2004-02-19 12:21:57
|
Hi again... > > But I just wanted to indicate that the combination of your=20 > IO Subsystem > > patch for fstool and my searchtools (Indexed Searching)=20 > patch create a > > system that is very powerful. > Indeed, your indexing support looks very cool. I havent=20 > played with it just=20 > yet though (gotta find some time :-) It seems we got the same problem: time.... ;-) But I'm already making Searchtools(Indexed searching) ready for your patch. Normally in Raw index mode I just read the raw image file. I'm now updating Searchtools to use Sleuthkit image reading, so when your patch comes out only minor changes in my code are needed to enable indexing of split dd files or Encase images..... > > The only thing really missing is a subsystem that makes it=20 > possible to > > "read" fileformats on the image with a specific=20 > interpreter. That would > > enable us to "read" PDF files, PST files, etc... > Im not sure I know what you mean, the IO subsystem is done at=20 > a very low level=20 > (well at the IO level)... The interpretation of different=20 > files on the=20 > filesystem is surely the job of a higher level application?=20 Yes sorry to confuse anybody... I meant that Sleuthkit as a whole should contain a generic way for accessing filetypes found on the images. At a higher level than the IO subsystem.. But indeed integrated with Sleuthkit. Otherwise one has to extract files from the image before they can be processed (For instance indexed (Hint!)). Autopsy would benefit from that as it would be possible to integrate FTK-like functionality to read PDF/PST files from the web interface. And it would make it possible to index files inside the image based on the text therein (Also files inside ZIP files and such).. Paul Bakker |
From: Michael C. <mic...@ne...> - 2004-02-19 10:53:02
|
On Thu, 19 Feb 2004 09:24 pm, Paul Bakker wrote: > Hi Michael.... Hi Paul, > This sounds very good and cool.... (I haven't looked at yours patch yet > though...).. Thanks... > But I just wanted to indicate that the combination of your IO Subsystem > patch for fstool and my searchtools (Indexed Searching) patch create a > system that is very powerful. Indeed, your indexing support looks very cool. I havent played with it just yet though (gotta find some time :-) > The only thing really missing is a subsystem that makes it possible to > "read" fileformats on the image with a specific interpreter. That would > enable us to "read" PDF files, PST files, etc... Im not sure I know what you mean, the IO subsystem is done at a very low level (well at the IO level)... The interpretation of different files on the filesystem is surely the job of a higher level application? For example in flag (http://sourceforge.net/projects/pyflag/), we are using exgrep (which is similar i gather to foreman) to extract files from the image and then use magic (and NSRL) to classify those and do some post processing. The GUI is then able to use the correct facility for displaying those images (usually by setting the correct mime type and asking the browser to display it but not necessarily). If you want to index the contents of binary files (say zip files or gziped files), maybe the best place to do so is by postprocessing at the higher level application? I am working also on reimplementing exgrep to use a python file-like object created using the proposed sleuthkit io subsystem. This way we can use exgrep to extract files from any type of image. For example we can find deleted and other wise un-recoverable images from an encase image etc. It would be cool if higher level programs (like autopsy or flag) can operate directly on the io subsystem for other file - like operations (like running foreman, indexing whatever). To this end I am working on a swig interface for this io subsystem, so we could use perl or python to directly access all those images. > If all these 3 are in place, I think sleuthkit is a product that is more > powerful than any of the other products I use... I concur with you. I had a bit of a play with encase and there is much room for encase to improve before it could be usable. (although as i mentioned I only had encase v 3, maybe 4 is better). Michael. |
From: Paul B. <ba...@fo...> - 2004-02-19 10:30:08
|
Hi Michael.... This sounds very good and cool.... (I haven't looked at yours patch yet = though...).. But I just wanted to indicate that the combination of your IO Subsystem = patch for fstool and my searchtools (Indexed Searching) patch create a = system that is very powerful. The only thing really missing is a subsystem that makes it possible to = "read" fileformats on the image with a specific interpreter. That would = enable us to "read" PDF files, PST files, etc...=20 If all these 3 are in place, I think sleuthkit is a product that is more = powerful than any of the other products I use... Paul Bakker -----Oorspronkelijk bericht----- Van: Michael Cohen [mailto:mic...@ne...] Verzonden: donderdag 19 februari 2004 9:17 Aan: Brian Carrier CC: sle...@li... Onderwerp: [sleuthkit-developers] Re: IO Subsystem patch for fstools Hi Brian, I have started implementing the changes according to your = suggestions, and=20 Its coming along great. I quite like the method of extending the basic=20 IO_INFO struct with more specific structs and then casting them back and = forth. The code has a great OO feeling about it. Thanks for the = pointers. Attached is an incomplete patch just to check that im on the right = track.=20 Only the fls tool is working with the new system currently, although it=20 should compile ok. Here is a summary of the changes: 1) fls uses the -i commandline paramter to choose the subsystem to use, = and=20 then gets an IO_INFO object by doing an: io=3Dio_open(io_subsys); If no subsystem is specified, it uses "standard" which is the current=20 default. 2) options are parsed into the io object by calling: =09 io_parse_options(io,argv[optind++]); (options are option=3Dvalue format or else if they do not contain = '=3D', we take=20 it as a filename). 3) Once we parse all of the options, we create an fs object using the io = object: fs =3D fs_open(io, fstype); This will initialise a fs->io paramter with the io subsystem object.=20 4) All filesystem code uses the io object to actually do the reading. = e.g.: ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *) = ext2fs,(char=20 *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); the io object has a number of methods. For example: io->constructor io->read_random io->read_block This has a cool object oriented feel about it. 5) There is an array in fs_io.c which acts like a class and is = initialised to=20 produce new objects of all IO_INFO derived types. (i.e. all new io = objects=20 are basically copies of this struct with extra stuff appended) Eg. static IO_INFO subsystems[] =3D{ { "standard","Standard Sleuthkit IO Subsystem", sizeof(IO_INFO_STD),=20 &io_constructor, &free, &std_help, &std_initialiser, &std_read_block, =20 &std_read_random, &std_open, &std_close}, Note that IO_INFO_STD is a derived object of IO_INFO: struct IO_INFO_STD { IO_INFO io; char *name; int fd; }; i.e. it has the same methods like IO_INFO, but extra attributes. All = the=20 other subsystems use this method to add extra attributes to the basic = IO_INFO=20 pointer which is carried around the place as an IO_INFO (cast from=20 IO_INFO_ADV etc). From a user perspective we can now do this: - Read in a simple dd partition (compatibility with old fls): fls -r -f linux-ext2 honeypot.hda5.dd - Read in a partition file from a hdd dd image. (i.e. use an offset): fls -r -i advanced offset=3D100 honeypot.hda5.dd -Read in a split dd file (after splitting with split): fls -r -i advanced /tmp/xa* Or (same thing just to emphesize the fact that multiple files can be = specified=20 on the same command line): fls -r -i advanced /tmp/xaa /tmp/xab /tmp/xac /tmp/xad -Read in an sgziped file: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz - List files in directory inode 62446: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz 62446 Whats left to do: As I mentioned, this is only an intermediate patch to check that im on = the=20 right track. These are the things that need to be finished: 1) Add a config file parser option to allow options to be passed from a = config=20 file, ie: fls -r -c config -i advanced honeypot.hda5.dd where config is just a files with lines like: offset=3D1024 2) Change offset to be in blocks (and add a blocks keyword to override = the fs=20 default). 3) Update all the other tools other than fls to support the new syntax. 4) I also have been working on an expert witness subsystem. Expert = witness is=20 the format used by encase, ftk etc. I have a filter working atm to = convert=20 these files to straight dd, but i want to implement a subsystem so we = can=20 work on these files directly in sleuthkit. This is coming very soon, = maybe=20 this weekend. This will obviously require lots of tender loving testing=20 because i only have encase ver 3 to play with. Please let me know what you think about this. Also let me know if there = are=20 other things that need to be completed still regards this patch. Michael. |
From: Michael C. <mic...@ne...> - 2004-02-19 10:15:25
|
Hi Brian, I have started implementing the changes according to your suggestions, a= nd=20 Its coming along great. I quite like the method of extending the basic=20 IO_INFO struct with more specific structs and then casting them back and=20 forth. The code has a great OO feeling about it. Thanks for the pointers. Attached is an incomplete patch just to check that im on the right track= =2E=20 Only the fls tool is working with the new system currently, although it=20 should compile ok. Here is a summary of the changes: 1) fls uses the -i commandline paramter to choose the subsystem to use, and= =20 then gets an IO_INFO object by doing an: io=3Dio_open(io_subsys); If no subsystem is specified, it uses "standard" which is the current=20 default. 2) options are parsed into the io object by calling: =09 io_parse_options(io,argv[optind++]); (options are option=3Dvalue format or else if they do not contain '=3D', w= e take=20 it as a filename). 3) Once we parse all of the options, we create an fs object using the io=20 object: fs =3D fs_open(io, fstype); This will initialise a fs->io paramter with the io subsystem object.=20 4) All filesystem code uses the io object to actually do the reading. e.g.: ext2fs->fs_info.io->read_random(ext2fs->fs_info.io, (FS_INFO *) ext2fs,(ch= ar=20 *) gd, sizeof(ext2fs_gd),offs, "group descriptor"); the io object has a number of methods. For example: io->constructor io->read_random io->read_block This has a cool object oriented feel about it. 5) There is an array in fs_io.c which acts like a class and is initialised = to=20 produce new objects of all IO_INFO derived types. (i.e. all new io objects= =20 are basically copies of this struct with extra stuff appended) Eg. static IO_INFO subsystems[] =3D{ { "standard","Standard Sleuthkit IO Subsystem", sizeof(IO_INFO_STD),=20 &io_constructor, &free, &std_help, &std_initialiser, &std_read_block, =20 &std_read_random, &std_open, &std_close}, Note that IO_INFO_STD is a derived object of IO_INFO: struct IO_INFO_STD { IO_INFO io; char *name; int fd; }; i.e. it has the same methods like IO_INFO, but extra attributes. All the=20 other subsystems use this method to add extra attributes to the basic IO_IN= =46O=20 pointer which is carried around the place as an IO_INFO (cast from=20 IO_INFO_ADV etc). =46rom a user perspective we can now do this: =2D Read in a simple dd partition (compatibility with old fls): fls -r -f linux-ext2 honeypot.hda5.dd =2D Read in a partition file from a hdd dd image. (i.e. use an offset): fls -r -i advanced offset=3D100 honeypot.hda5.dd =2DRead in a split dd file (after splitting with split): fls -r -i advanced /tmp/xa* Or (same thing just to emphesize the fact that multiple files can be specif= ied=20 on the same command line): fls -r -i advanced /tmp/xaa /tmp/xab /tmp/xac /tmp/xad =2DRead in an sgziped file: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz =2D List files in directory inode 62446: fls -i sgzip -r -f linux-ext2 honeypot.hda5.dd.sgz 62446 Whats left to do: As I mentioned, this is only an intermediate patch to check that im on the= =20 right track. These are the things that need to be finished: 1) Add a config file parser option to allow options to be passed from a con= fig=20 file, ie: fls -r -c config -i advanced honeypot.hda5.dd where config is just a files with lines like: offset=3D1024 2) Change offset to be in blocks (and add a blocks keyword to override the = fs=20 default). 3) Update all the other tools other than fls to support the new syntax. 4) I also have been working on an expert witness subsystem. Expert witness = is=20 the format used by encase, ftk etc. I have a filter working atm to convert= =20 these files to straight dd, but i want to implement a subsystem so we can=20 work on these files directly in sleuthkit. This is coming very soon, maybe= =20 this weekend. This will obviously require lots of tender loving testing=20 because i only have encase ver 3 to play with. Please let me know what you think about this. Also let me know if there are= =20 other things that need to be completed still regards this patch. Michael. |
From: Brian C. <ca...@sl...> - 2004-02-19 02:29:00
|
On Feb 18, 2004, at 1:58 PM, Epsilon wrote: > I'm getting a very large (>500 MB) file when using the -s option with > icat when I should be getting a file that's around 40 KB. I'm using > sleuthkit-1.67. Anyone else seeing this? Wow. What file system type? Can you send the output of running 'istat' on it? brian |
From: Epsilon <ep...@ya...> - 2004-02-18 20:03:21
|
I'm getting a very large (>500 MB) file when using the -s option with icat when I should be getting a file that's around 40 KB. I'm using sleuthkit-1.67. Anyone else seeing this? __________________________________ Do you Yahoo!? Yahoo! Mail SpamGuard - Read only the mail you want. http://antispam.yahoo.com/tools |
From: Brian C. <ca...@sl...> - 2004-02-10 06:46:25
|
> > This release has many improvements, but most importantly it now > actively uses the > "icat.c" and "ils.c" files (At least the code of them) and the > fs_tools.a library.. > > This means that if a lot of things will change in the sleuthkit code, > that this will > affect the indexed searching patch.... So I would like to know if that > will happen > up front, if that is possible.... There shouldn't be major changes. You may need to call a function to open the image and the arguments to fs_open may change, but that is about it. The core API will remain the same. I would like to see if we can incorporate the indexing code into the redesign. > The patches made by Pepijn Vissers and me have gotten their own > webpage and can be > found on http://www.brainspark.nl/?show=tools_sleuthkit (This link > will also be > placed on the download page of sleuthkit and autopsy..)... Its up there. I added it at some point during the past couple of weeks. thanks, brian |
From: Brian C. <ca...@sl...> - 2004-02-10 06:33:39
|
On Feb 9, 2004, at 3:44 AM, Michael Cohen wrote: >> Yea, but autopsy or flag need to store all of those options as well in >> their own configuration file so that they can pass them to the >> fstools. >> It would seem more flexible if the sleuth kit had a configuration >> file >> for the image, which could then be used by any GUI (including autopsy, >> flag, rex etc.) > The problem with this approach is that the fstools are then too > integrated > with the GUI, especially if the configuration file becomes so complex > that > you really need a GUI to make one. In that case you cant just use them > by > themselve. Is that something we are prepared to live with? Or is a > design > goal to make small self contained tools that may be used from the > command > line? I guess that depends on how complex the config file is. RAID systems are obviously the most complex Is there a way that we can make the config file fairly simple and have the maps built into the sleuth kit instead of the GUI? That will be faster to load as well if it just has to read a few parameters from the config file instead of a full mapping. > >> Sure. For this to work with the Sleuth Kit though, there must be the >> ability to create the configurations in the sleuth kit. If the only >> way to create the map is in flag, then that doesn't do autopsy any >> good >> or future interfaces and it doesn't make sense to replicate the stuff >> in each gui. > > Thats true. However, a raid map must be generated internally anyway in > order > to reassemble the individual raid implementations (e.g. lvm, linux > raid, > etc). Perhaps we can have different IO subsystems which all they do is > generate a generic map and then call the generic raid implementation? > So for > example say we have a generic raid io subsystem as described above > that takes > on a raid map as input, then we have another subsystem called lvm for > example > which accepts a bunch of lvm specific parameters and then generates a > raid > map and calls the generic raid io subsystem. This way autopsy doesnt > need to > be able to build a generic map in the gui, but one will be built > automatically as required. If the user works out a way to build a raid > map by > some other means (i.e. some other GUI, by hand, or whatever), they can > still > use the generic raid implementation. I would rather let each of the RAID types have their own data structures and read functions. That seems much more simple, scalable and efficient to run. It seems that making a map for every type of RAID system is like trying to make a mapping for all file systems so that we can use generic code for processing. Maybe we can have a type for generic RAID, but for Windows LDM or the Linux RAID configurations where the data structures and layout are known I would rather have standard code. That is much easier to audit as well. If someone wants to review what is going on, having to audit a raid map would be a major pain. > >>> Maybe it would make more sense to populate the IO_INFO structure >>> inside the >>> FS_INFO structure? >> >> I would rather not. I would prefer to keep the file system code >> separate from the image format code. In fact, I would even consider >> making all of this image stuff its own library, >> imgtools maybe. It seems much more logical to call the file system >> processing code with the filled in IO_INFO structure and let it read >> from it. The file system code would never touch any of the file >> descriptors, it would just call the read functions. This also allows >> the 'mm...' tools to use the image formats and any other future tools, >> such as memory images that are split or saved in another tool's >> proprietary format. > > Just to clarify what you are saying... Are you proposing to make the > io_subsystem and file system code into seperate libraries, and then the > individual tools (e.g. fls) would open the subsystem, and initialise > it, and > then call the file system code giving it a filled in IO_INFO > structure? If I > understood your comment right it sounds great. > > So the IO_INFO structure will contain function pointers to the > read_random and > read_block which will be initialised by the constructor, and the fs > code > would just call those methods? Sounds great: Yea. That seems to scale the best. > FS_INFO * > ext2fs_open(const char *name, unsigned char ftype) > > Changes to: > FS_INFO * > ext2fs_open(IO_INFO *io) > > (BTW do you think that ftype is a little redundant here? Yes and no. For most cases it is, but for FAT where the user can force it to be FAT12, FAT16 or FAT32, then it is needed. If we do auto detection though, then we can probably scrap it. > and a little off > topic, it would be nicer if the *fs_open routines returned NULL if they > couldnt find the filesystem rather than error out, cause then you > could cycle > over all filesystem decoders until one worked rather than demanding > the user > specify the -f parameter all the time. You could use -f to override the > automatic detection) Yea, that is a good idea. These types of changes are the ones that I want to examine when The Sleuth Kit gets a make over in the next few months. Auto detection of file systems would be very nice. > >> I would lean towards the way that FS_INFO is structured. There would >> be a few basic items in IO_INFO, such as the function pointers and >> maybe the maximum size of the image. Then there are image specific >> structures that have their needed values. For example, the structure >> for split images may have an array of file descriptors and a >> structure >> with the sizes of each split image. The normal image structure may >> just >> have one file descriptor. Actually, maybe this whole thing is better >> called IMG_INFO instead of IO_INFO. > > That sounds great we could cast a void* to achieve this, and then each > io > subsystem makes it own pointer and casts to void*: > strcut IMG_INFO { > common fields > .... > common fields > function pointers.... > void *data; > } To keep the fstools and imgtools code consistent we should either change the fstools structures to use the void *, or we can just use the method that they use. The file system structures, NTFS_INFO for example, are defined like: struct NTFS_INFO { FS_INFO fs_info; int blah; int boo; } The code casts the pointers as both FS_INFO and NTFS_INFO. So, the argument to an API function takes the FS_INFO structure, but the ntfs code just casts it to an NTFS_INFO. Both do the job, but I would like to be consistent. > and maybe the multipart reassmebly has: > struct part { > char *filename; > struct part* next; > } Yea, and we can include a file descriptor that is only opened when that file is needed and isn't closed until the end. We'll also need a size field in there somehow as well, but we can figure out the details later. thanks, brian |
From: Paul B. <ba...@fo...> - 2004-02-09 09:16:43
|
Hi everybody... Long time no see..... had a busy time.... Before all code is garbled up again ;-).. i wanted to let everybody know = that within a month the third release of indexed searching will be = released... This release has many improvements, but most importantly it now actively = uses the=20 "icat.c" and "ils.c" files (At least the code of them) and the = fs_tools.a library.. This means that if a lot of things will change in the sleuthkit code, = that this will affect the indexed searching patch.... So I would like to know if that = will happen up front, if that is possible.... The patches made by Pepijn Vissers and me have gotten their own webpage = and can be found on http://www.brainspark.nl/?show=3Dtools_sleuthkit (This link = will also be placed on the download page of sleuthkit and autopsy..)... I hope to release the third version soon.. I will post a full "new = featurelist" here if I do.. Paul Bakker > -----Oorspronkelijk bericht----- > Van: Brian Carrier [mailto:ca...@sl...] > Verzonden: woensdag 4 februari 2004 16:14 > Aan: Dave > CC: Sleuthkit > Onderwerp: Re: [sleuthkit-developers] Sleuthkit -> database patch >=20 >=20 > Wow again! All of these projects that I have been thinking=20 > about doing=20 > are getting done! Thanks. >=20 > As an FYI, after autopsy gets is redesign finished, I had=20 > been meaning=20 > to re-examine The Sleuth Kit. One of the things that I wanted to=20 > change was the output of tools such as 'ils' and 'fls' so that they=20 > could be more useful and more easily processed. Much of the=20 > output is=20 > still legacy from the TCT design. For example, I'm not sure=20 > if I have=20 > ever used the default output of 'ils'. So, the results from this=20 > work will be useful when figuring out the best format options=20 > and what=20 > the important data is in the output. >=20 > I'll add pointers to the archive with this patch and the IO subsystem=20 > patch from the downloads page. >=20 > thanks, > brian >=20 >=20 >=20 > On Feb 4, 2004, at 6:02 AM, Dave wrote: >=20 > > Hi all, > > Attached is a patch to sleuthkit to output sleuthkit=20 > filesystem data as > > SQL statements for entry into a database. > > > > Background: > > Sleuthkit fstools output are not easily machine-readable,=20 > and as such > > not well suited for use by front-end gui applications. A better=20 > > approach > > is to analyse the filesystem in one pass and store all the=20 > filesystem > > data (about files, inodes, blocks etc) in a database system=20 > for the gui > > analysis program to query at will. > > >=20 >=20 >=20 > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers >=20 |
From: Michael C. <mic...@ne...> - 2004-02-09 08:44:47
|
> Yea, but autopsy or flag need to store all of those options as well in > their own configuration file so that they can pass them to the fstools. > It would seem more flexible if the sleuth kit had a configuration file > for the image, which could then be used by any GUI (including autopsy, > flag, rex etc.) The problem with this approach is that the fstools are then too integrated with the GUI, especially if the configuration file becomes so complex that you really need a GUI to make one. In that case you cant just use them by themselve. Is that something we are prepared to live with? Or is a design goal to make small self contained tools that may be used from the command line? I agree that fstools by themselves are probably not all that useful without having some sort of GUI. So perhaps we just live with an increased level of complexity for the fstools, in favour of better integration into larger GUIs? The other problem that may arise from trying to make fstool integrate with the GUI's configuration files is that different GUIs store configuration in different ways, for example flag stores everything in the database (not even in a file), so having a single configuration file format is a little clunky. Its not so bad currently for flag, since we are currently using the database patch that was posted on the list a little while ago to dump out all the data from the image and we never really use the individual tools like ils,fls, icat etc. So it wont be too hard to simply write out a conf file for each image. However, im just thinking of the old version of flag where we did shell out to these tools basically for each file in the filesystem, the cost of parsing a huge config file for each invokation of icat would be tremendous i would imagine. > If we are going to start discussing configuration files for the fstools > (which we both agree are required for at least RAID), then I would > rather make then general enough so that they can be used for other > formats besides RAID. I would even like to have these include > the file system type, mounting point, and hashes of each partition. > Basically the stuff that the other tools include in some proprietary > format with the image, we would put in a separate text file. Thats a great idea (accepting the level of complexity from the fs tool is increased). I would vote for using an xml config file format, since its standard and easy to deal with and we dont have to write a parser. The downside is that we increase the program dependency by requiring libxml2 to be present. Alternatively we could write some yacc/lex parser but we than need to discuss a good format which will be sufficient for autopsy and allow future growth. > Sure. For this to work with the Sleuth Kit though, there must be the > ability to create the configurations in the sleuth kit. If the only > way to create the map is in flag, then that doesn't do autopsy any good > or future interfaces and it doesn't make sense to replicate the stuff > in each gui. Thats true. However, a raid map must be generated internally anyway in order to reassemble the individual raid implementations (e.g. lvm, linux raid, etc). Perhaps we can have different IO subsystems which all they do is generate a generic map and then call the generic raid implementation? So for example say we have a generic raid io subsystem as described above that takes on a raid map as input, then we have another subsystem called lvm for example which accepts a bunch of lvm specific parameters and then generates a raid map and calls the generic raid io subsystem. This way autopsy doesnt need to be able to build a generic map in the gui, but one will be built automatically as required. If the user works out a way to build a raid map by some other means (i.e. some other GUI, by hand, or whatever), they can still use the generic raid implementation. > I'm still not convinced that we need so many options on the command > line. The only case that I can see where all of the command line > options are beneficial is for a live analysis where you don't want to > write to the disk. But, in that case I don't see why you would need to > use any of these complex image formats because you will have access to > the raw device corresponding to the partition. Thats true, and if you have access to the raw device you would not need extra options or more complex io subsystems. > Is there a specific reason with flag that command line options are > easier? No reason currently, because we have our own program (dbtool) written using the fstools library (as is seen in the patch dave submitted). Im just thinking about the way it used to work by shelling out. Maybe a better way is to simply document the fstools library and define a clear interface (with a proper shared library), and then people would be expected to use the library rather than shell out to the tools all the time. > > Maybe it would make more sense to populate the IO_INFO structure > > inside the > > FS_INFO structure? > > I would rather not. I would prefer to keep the file system code > separate from the image format code. In fact, I would even consider > making all of this image stuff its own library, > imgtools maybe. It seems much more logical to call the file system > processing code with the filled in IO_INFO structure and let it read > from it. The file system code would never touch any of the file > descriptors, it would just call the read functions. This also allows > the 'mm...' tools to use the image formats and any other future tools, > such as memory images that are split or saved in another tool's > proprietary format. Just to clarify what you are saying... Are you proposing to make the io_subsystem and file system code into seperate libraries, and then the individual tools (e.g. fls) would open the subsystem, and initialise it, and then call the file system code giving it a filled in IO_INFO structure? If I understood your comment right it sounds great. So the IO_INFO structure will contain function pointers to the read_random and read_block which will be initialised by the constructor, and the fs code would just call those methods? Sounds great: FS_INFO * ext2fs_open(const char *name, unsigned char ftype) Changes to: FS_INFO * ext2fs_open(IO_INFO *io) (BTW do you think that ftype is a little redundant here? and a little off topic, it would be nicer if the *fs_open routines returned NULL if they couldnt find the filesystem rather than error out, cause then you could cycle over all filesystem decoders until one worked rather than demanding the user specify the -f parameter all the time. You could use -f to override the automatic detection) > I would lean towards the way that FS_INFO is structured. There would > be a few basic items in IO_INFO, such as the function pointers and > maybe the maximum size of the image. Then there are image specific > structures that have their needed values. For example, the structure > for split images may have an array of file descriptors and a structure > with the sizes of each split image. The normal image structure may just > have one file descriptor. Actually, maybe this whole thing is better > called IMG_INFO instead of IO_INFO. That sounds great we could cast a void* to achieve this, and then each io subsystem makes it own pointer and casts to void*: strcut IMG_INFO { common fields .... common fields function pointers.... void *data; } and maybe the multipart reassmebly has: struct part { char *filename; struct part* next; } So we initialise as: IMG_INFO *img; img->data=(void *)part_list While the raid is totaly different: struct raid { whatever, ... more stuff, } The advatage of this option is that the IMG_INFO struct doesnt need to know about each subsystem. > In the imgtools collection, we could actually have a tool that > converts the proprietary image formats to a raw image. That could be a new stand alone tool which chooses the right io-subsystem and dumps a dd image out. It would be useful in the case of raid. > Very cool. I had never seen sgzip before. I guess it isn't as much of > a pain as I thought :) It only took a day or so to write sgzip for this purpose, and I thought it would be useful in general for any application needing quick seeking in a compressed file. The library is now available in general on sf: http://sourceforge.net/project/showfiles.php?group_id=100803 > > Do you have any idea how you would read in encase files? I didnt get > > the > > chance to ever use it so i dont know how complex the file format is > > but there > > is nothing i can find on the net re the format. > > Check out asrdata.com. Somewhere on there is a link to the expert > witness format. Thanks for that, the format looks remarkably similar to sgzip except with some extra meta data stuck in there. Should be easy to write a library to access this. I just need to get a small example encase image to play with. > I apologize if I am being a pain with some of these details, but after > having to redesign autopsy because of a bad initial design, I want to > make sure we add this new functionality the right way. I think the discussion is very constructive so far. I was initially expecting a small change, but it looks like there is a need now to do a larger reorganization of code. Its going to pay off in the long run I expect. cheers Michael. |
From: Brian C. <ca...@sl...> - 2004-02-09 06:04:45
|
On Feb 8, 2004, at 10:40 AM, Matthias Hofherr wrote: > What about tools which are used by both blackhats and whitehats ? Where > would you place, e.g. nmap, packit ... ? I have no clue. I think we need to group them based on core functionality, not on historical associations. Therefore, nmap would go in the same category as all port scanners, even the nice windows GUI ones. I didn't add them to a category because i wasn't sure if there should be a network utilities category and if there was such a category what its requirements would be. I am unsure if port scanners are an attack security tool or a general network tool. I'm not sure where sniffers fit either. I would say that packit is an attack tool so it goes into the security-attack category. The categories can't reflect the intent of an installation or execution. Port scanners that have been customized to search for specific services and launch attacks or create config files that can be used for attacks have been designed to attack and would therefore go into the attack category and is considered different than nmap. After thinking about this, when these searches are conducted on the hard disk, we are looking for tools and files that serve a certain function. If we are looking at a server intrusion case, we want to know about all tools that could have played a role regardless if it is nmap or netcat or the network utilities program that comes with OS X. Maybe subcategories are a good idea. For example, there maybe a general network utilities category. You can select it as either all good or all bad, or you can select the state of each subcategory (host scanners, port scanners, sniffers). Any of these utilities that has been customized for attacking will be placed in the security attack category. > In which category whould you place child-porn ? It falls in the 'Multimedia Files' category because it is a graphical image file. child porn is such a unique and common case though, that I think it warrants a subcategory or a related multimedia category. This is tough! As a test for any taxonomy that we come up with, it would be useful if we could map the existing application types in the NSRL to them. thanks, brian |
From: Brian C. <ca...@sl...> - 2004-02-09 05:35:26
|
On Feb 7, 2004, at 7:52 AM, Michael Cohen wrote: > Although i would normally agree with you that configuration files can > contain > much more information and therefore most parameters should go there, > in this > case it would be a pain to have all these options passed in through a > configuration file. This is primarily because fstools are most often > used as > a backend to larger programs (like flag and autopsy), and so its much > easier > to shell out to them even if they have 20 command line arguements, > than have > to write a config file and then invoke them with it. Yea, but autopsy or flag need to store all of those options as well in their own configuration file so that they can pass them to the fstools. It would seem more flexible if the sleuth kit had a configuration file for the image, which could then be used by any GUI (including autopsy, flag, rex etc.) If we are going to start discussing configuration files for the fstools (which we both agree are required for at least RAID), then I would rather make then general enough so that they can be used for other formats besides RAID. I would even like to have these include the file system type, mounting point, and hashes of each partition. Basically the stuff that the other tools include in some proprietary format with the image, we would put in a separate text file. >> Oh ok. I think that it will be very hard to create such a >> configuration file. To create the file, you will need to know which >> VM >> / RAID system is being used. I think it would be much easier to have >> a >> subsystem for each VM / RAID type and then the only thing that needs >> to >> be specified in the configuration file is the options for that type. >> For example, if the Linux LVM were used, then you may need to only >> specify the disk ordering and the block size. When reading from the >> image, the lvm-split-read() function would be used. > > The raid map is easily assembled using the gui in flag. This is not a > problem. > If the investigator knows which raid implementation it is, they can > lookup a > library of such maps and use those, but the io subsystem needs to > handle > arbitrary reassembly maps. > > Maybe we can implement such a library in fs_io, so that you could > invoke the > same raid subsystem, and pass it a parameter say "type" naming which > raid > type it should use and it can look up the map by itself? Sure. For this to work with the Sleuth Kit though, there must be the ability to create the configurations in the sleuth kit. If the only way to create the map is in flag, then that doesn't do autopsy any good or future interfaces and it doesn't make sense to replicate the stuff in each gui. >> I have no problems changing all of the files. If we are going to add >> this functionality, I would rather do it right the first time. > Great. What you propose is a much better way. We just need to resolve > the > option issue because these options need to be present inside such an > IO_INFO > struct that will be passed around the place. Since these options can be > arbitrary and they only make sense for the subsystem itself, what do > you > think about my original suggestion of having a single void *data field > in the > IO_INFO struct and letting each subsystem put they stuff in there? I'm still not convinced that we need so many options on the command line. The only case that I can see where all of the command line options are beneficial is for a live analysis where you don't want to write to the disk. But, in that case I don't see why you would need to use any of these complex image formats because you will have access to the raw device corresponding to the partition. Is there a specific reason with flag that command line options are easier? >> Actually, I guess we just need one io_open() function because fls.c >> and >> similar files will not know if the file is a config file or an image >> file. io_open would have a char ** to list the image files or config >> file, a type field for the type of image format, and an offset value. >> It would then fill in the IO_INFO structure and return it, which would >> be passed to fs_open(). > Maybe it would make more sense to populate the IO_INFO structure > inside the > FS_INFO structure? I would rather not. I would prefer to keep the file system code separate from the image format code. In fact, I would even consider making all of this image stuff its own library, imgtools maybe. It seems much more logical to call the file system processing code with the filled in IO_INFO structure and let it read from it. The file system code would never touch any of the file descriptors, it would just call the read functions. This also allows the 'mm...' tools to use the image formats and any other future tools, such as memory images that are split or saved in another tool's proprietary format. > Next we just need to agree on what to put in struct IO_INFO. I would lean towards the way that FS_INFO is structured. There would be a few basic items in IO_INFO, such as the function pointers and maybe the maximum size of the image. Then there are image specific structures that have their needed values. For example, the structure for split images may have an array of file descriptors and a structure with the sizes of each split image. The normal image structure may just have one file descriptor. Actually, maybe this whole thing is better called IMG_INFO instead of IO_INFO. In the imgtools collection, we could actually have a tool that converts the proprietary image formats to a raw image. >> Compression would be a major pain. Split, EnCase, and some of the >> RAID >> systems seem much easier. > Please see the patch i just submitted re compression. Very cool. I had never seen sgzip before. I guess it isn't as much of a pain as I thought :) > Do you have any idea how you would read in encase files? I didnt get > the > chance to ever use it so i dont know how complex the file format is > but there > is nothing i can find on the net re the format. Check out asrdata.com. Somewhere on there is a link to the expert witness format. I apologize if I am being a pain with some of these details, but after having to redesign autopsy because of a bad initial design, I want to make sure we add this new functionality the right way. thanks, brian |
From: Matthias H. <mat...@mh...> - 2004-02-08 15:40:11
|
Hi Brian, Brian Carrier said: [...] > Security - Prevention > - Tools and files that are used to secure a system from attack > - anti-virus > - personal firewalls > - IDS > > Security - Attack > - Tools and files that are used to cause a security incident > - exploits > - attack tools > - DDoS tools > - viruses > - Tools and files that are used to remove evidence of incident > - log cleaner > - evidence eliminator > - Tools and files that are used to allow access to a compromised system > - rootkits [...] What about tools which are used by both blackhats and whitehats ? Where would you place, e.g. nmap, packit ... ? In which category whould you place child-porn ? Regards, Matthias |
From: Michael C. <mic...@ne...> - 2004-02-08 01:44:41
|
OK I'll try sending this again, the first time it was blocked by the list server for being too big, so i compressed the patch. Hopefully this is ok. (also there are some small changes since the original patch namely decompress now workes with sgzip). Hi List members, Thank you for positive response on the recently proposed IO subsystem patch for sleuthkit. Although discussions are still underway as to how exactly this should be integrated into the sk (the way this is done now may not fit very well in the overall architecture), I have been experimenting with adding more subsystems besides the existing "advanced" subsystem. The latest addition is a sgzip io subsystem. Background, In todays forensic work, one regularly needs to work on very large hard disks, many new systems are sold with 80+GB hdd, so even the most trivial forensic analysis workstation needs to be able to accomodate huge dd images. This is obviously not practical. Secondly, storage and archiving of such images is difficult. This is why most people compress their images for storage. Currently people need to uncompress their images so that the sleuthkit can deal with them which is a major pain in the proverbial. Solution: The previous io-subsystem patch allows the easy integration of different io decoders for dealing with image files of verious formats. This current version (its a cumulative patch replacing the previous patch as well), implements an interface into the sgzip library (also supplied in the patch). sgzip is (seekable gzip) a file format allowing to implement fask seekable compressed images. The problem with regular gzip file is that one can not seek in them in a reasonable time (typically it involves decompressing the whole file for each seek). sgzip allows very fast seek times, and a very small loss of compression (up to about 2%). The details about the sgzip file format can be read in the sgzlib.h for interested developers. Basically after applying the patch to sleuthkit-1.67, and compiling normally, you will get a new binary in bin/ called sgzip. This is very similar to gzip. You use it like this: bash$ sgzip honeypot.hda8.dd This will create a file called honeypot.hda8.dd.sgz. Unlike gzip it does not unlink the original file, but it will overwrite a honeypot.hda8.dd.sgz if its there. (you can also use sgzip by piping into it etc, just like gzip. Usefull for grabbing images from netcat straight into sgzip). The filesize of the new compressed image is a little larger than a corresponding gzip file because it has more indexes and the compression algorithm is suboptimal when its broken across several blocks, but the difference is too small to care about. You can use the sleuthkit to directly work with this new compressed file by calling on the right subsystem: bash$ fls -r -i sgzip -f linux-ext2 honeypot.hda8.dd.sgz Thats it. complex hey? The sgzip subsystem also takes on an offset arguement so we can compress a whole hdd image and then work with individual partitions. Performance: For detailed documentation of the sgzlib implementation read the sgzlib.h file and the source code. Suffice it to say that sgzip works by breaking the uncompressed file into blocks, and compressing each block seperately. Then, when we need to uncompress a random bit of data we find the right block and decompress it. Hence if we want to read very small runs of data we are better off having smaller blocks so we do need to decompress data that we dont need. The user has control over the block size when compressing the file by using the -B argument to sgzip. Smaller blocks means faster sleuthkit, but less efficient compression. For example I took the honeypot.hda8.dd.sgz produced above (btw this is a disk from the honeynet forensic challenge) and did some benchmarking: Timing the uncompressed version (filesize=272,465,920=272MB) bash$ time fls -r -f linux-ext2 honeypot.hda8.dd > /dev/null real 0m0.050s user 0m0.010s sys 0m0.040s Hardly any time at all.... Now I compressed the file with a blocksize of 100kb using -B 100000 as arguement to sgzip (filesize=25,237,474=25MB): bash$ time fls -r -i sgzip -f linux-ext2 honeypot.hda8.dd.sgz > /dev/null real 0m12.600s user 0m9.590s sys 0m2.560s And the default blocksize (which is 512kb): (filesize=25,008,704=25MB) real 1m32.119s user 1m10.190s sys 0m12.870s Just for comparison the size of a normal gzip file is 24968226 bytes. So the sgzip file with 100kb blocksize is 1.07% bigger, and the speed is still acceptable and about 8-10 times faster than the default 512kb blocksize. But this is probably because fls is reading lots of very small runs of data scattered all over the whole disk. Just to be rediculous, I repeat the test with a block size of 10kb: real 0m1.421s user 0m1.310s sys 0m0.000s However, the filesize has dramatically increased now to 27,435,391 (27MB), which is about 10% bigger than pure gzip. That may be ok in some circumstances, but it must be remembered that in this case this particular hdd is mostly empty so it compresses very well. I expect the expansion of size to be more noticable in a more full disk. Future developments: - I am planning on writing a python module for sgzip (prob just use swig to access the c library). If people are interested i might look at how you make perl modules from swig. (I dont do much perl nowadays, since flag is rewritten in python.) |
From: Michael C. <mic...@ne...> - 2004-02-07 13:33:25
|
Hi Brian, > Ok. I was under the impression that you wanted to have a configuration > file for any of the more complex subsystems and therefore the options > could be specified there. The offset is the only variable in the > process that may change between executions (i.e. accessing a different > partition) and doesn't make sense to be in a config file. Unless the > config file allowed you to assign names to offsets. For example, > assign the name 'part1' to sector offset 63 and then you could use 'fls > -f ntfs -o part1 image.dd'. Although i would normally agree with you that configuration files can contain much more information and therefore most parameters should go there, in this case it would be a pain to have all these options passed in through a configuration file. This is primarily because fstools are most often used as a backend to larger programs (like flag and autopsy), and so its much easier to shell out to them even if they have 20 command line arguements, than have to write a config file and then invoke them with it. > That is why I was assuming that a configuration file would be used for > complex situations. What does the blocksize value do for a split > image? It seems that only the RAID / VM configurations need complex > options. The split mode (or EnCase if that happens in the future) can > be done w/out options. To keep the system truely generic its useful to allow arbitrary options to be passed to the subsystem, rather than trying to incorporate within the current framework, its difficult to imagine what parameters will be required for some arbitrary subsystem someone might come up with in the future. > I would rather force complex configurations to configuration files. > The command line options for the sleuth kit are already too numerous > and it will make using Autopsy easier if the config file can be > referenced instead of having to load up the command line every time. This makes sense if the config file was the same (not changing) for a single case for example, or a set of images. So the config file only gets written once and then continually referenced from invokation to invokation, rather than have to rewrite the file each time. > Oh. I was thinking that the configuration file would have an entry > that identified which IO subsystem to use. For example, a line that > says: > > image_format = "split" > or > image_format = "lvm-splice" There might be a misunderstanding with the raid stuff. What i am writing is a generic raid subsystem that reassembles any raid 5 implementation without having a clue about its type/brand etc. The only reason it prob should have a config file is because its so huge. (the raid map is big). > Oh ok. I think that it will be very hard to create such a > configuration file. To create the file, you will need to know which VM > / RAID system is being used. I think it would be much easier to have a > subsystem for each VM / RAID type and then the only thing that needs to > be specified in the configuration file is the options for that type. > For example, if the Linux LVM were used, then you may need to only > specify the disk ordering and the block size. When reading from the > image, the lvm-split-read() function would be used. The raid map is easily assembled using the gui in flag. This is not a problem. If the investigator knows which raid implementation it is, they can lookup a library of such maps and use those, but the io subsystem needs to handle arbitrary reassembly maps. Maybe we can implement such a library in fs_io, so that you could invoke the same raid subsystem, and pass it a parameter say "type" naming which raid type it should use and it can look up the map by itself? > I have no problems changing all of the files. If we are going to add > this functionality, I would rather do it right the first time. Great. What you propose is a much better way. We just need to resolve the option issue because these options need to be present inside such an IO_INFO struct that will be passed around the place. Since these options can be arbitrary and they only make sense for the subsystem itself, what do you think about my original suggestion of having a single void *data field in the IO_INFO struct and letting each subsystem put they stuff in there? > Actually, I guess we just need one io_open() function because fls.c and > similar files will not know if the file is a config file or an image > file. io_open would have a char ** to list the image files or config > file, a type field for the type of image format, and an offset value. > It would then fill in the IO_INFO structure and return it, which would > be passed to fs_open(). Maybe it would make more sense to populate the IO_INFO structure inside the FS_INFO structure? so instead of having a fs->fd, we have a fs->io_info. Makes more sense since with io_subsystems, the FS_INFO structure doesnt really deal with file descriptors anyway (the filesystem code never directly interacts with fds). So in the filesystem code you would have: if ((fs->io_info = io_open(name, O_RDONLY)) < 0) ... and then, subsequently: fs_read_random(fs->io_info,(char *)ext2fs->fs,len,EXT2FS_SBOFF,"Checking for EXT2FS"); .... Next we just need to agree on what to put in struct IO_INFO. > Compression would be a major pain. Split, EnCase, and some of the RAID > systems seem much easier. Please see the patch i just submitted re compression. Do you have any idea how you would read in encase files? I didnt get the chance to ever use it so i dont know how complex the file format is but there is nothing i can find on the net re the format. > thanks, > brian Thanks, Michael. |
From: Brian C. <ca...@sl...> - 2004-02-04 15:14:30
|
Wow again! All of these projects that I have been thinking about doing are getting done! Thanks. As an FYI, after autopsy gets is redesign finished, I had been meaning to re-examine The Sleuth Kit. One of the things that I wanted to change was the output of tools such as 'ils' and 'fls' so that they could be more useful and more easily processed. Much of the output is still legacy from the TCT design. For example, I'm not sure if I have ever used the default output of 'ils'. So, the results from this work will be useful when figuring out the best format options and what the important data is in the output. I'll add pointers to the archive with this patch and the IO subsystem patch from the downloads page. thanks, brian On Feb 4, 2004, at 6:02 AM, Dave wrote: > Hi all, > Attached is a patch to sleuthkit to output sleuthkit filesystem data as > SQL statements for entry into a database. > > Background: > Sleuthkit fstools output are not easily machine-readable, and as such > not well suited for use by front-end gui applications. A better > approach > is to analyse the filesystem in one pass and store all the filesystem > data (about files, inodes, blocks etc) in a database system for the gui > analysis program to query at will. > |
From: Brian C. <ca...@sl...> - 2004-02-04 15:03:20
|
>> My original plan was to use the '-o' flag to specify the sector offset >> for the file system. I figured sectors would be easier than bytes >> because mmls and fdisk give you the values in sectors and almost every >> disk uses a 512-byte sector. This also allows people to use the >> offset >> value without the '-i' setting. > > Great idea. Sectors would be much more useful than straight bytes. The > idea is > that each subsystem may choose to implement its logical-physical > mapping > however makes sense for it. And therefore would need different > parameters > most conveniently denoted by name. So rather than waste a whole > option -o on > just an offset, maybe we could use -o to specify a number of subsystem > dependant options. Ok. I was under the impression that you wanted to have a configuration file for any of the more complex subsystems and therefore the options could be specified there. The offset is the only variable in the process that may change between executions (i.e. accessing a different partition) and doesn't make sense to be in a config file. Unless the config file allowed you to assign names to offsets. For example, assign the name 'part1' to sector offset 63 and then you could use 'fls -f ntfs -o part1 image.dd'. >> >> I would actually say that '-i' should only have the type and no other >> options. If multiple files are needed (splitting and RAID), then they >> should be appended to the end of the command. For example, to look at >> the file system at offset sector 12345, the following could be used >> (names are made up): > > That is indeed a good suggestion. It needs more careful manipulation > of the > getopts in the client program but it should work. The only trouble is > that > the parameters to the subsystem can be arbitrary- subsystem specific > ones, so > for example maybe for split image: > > fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd > file2.dd > > and just in case you wanted to have a file called offset or blocksize, > you > could use a qualifier called file= in front of it like: > > fls -f linux-ext3 -i split -o offset=12345 blocksize=512 file1.dd > file=offset > > but without a qualifier, its just interpreted as a filename. Similarly > for the > truely lazy user if the subsystem specific option parser sees an option > consisting just a number, it takes that as the offset, then you dont > need to > qulify offset by using a keywork. That is why I was assuming that a configuration file would be used for complex situations. What does the blocksize value do for a split image? It seems that only the RAID / VM configurations need complex options. The split mode (or EnCase if that happens in the future) can be done w/out options. I would rather force complex configurations to configuration files. The command line options for the sleuth kit are already too numerous and it will make using Autopsy easier if the config file can be referenced instead of having to load up the command line every time. > >> It would also be useful if the config file format that you are >> developing for the RAID images could be used for the split images. > > It can, but the algorithm for the raid reconstruction is more complex, > and > performance would suffer if the same subsystem was used all around. The > format (not finalised yet...) is something like: Oh. I was thinking that the configuration file would have an entry that identified which IO subsystem to use. For example, a line that says: image_format = "split" or image_format = "lvm-splice" > one per line. A slot is the logical position within the raid period > where the > block should be taken from. example: > > 1,1 > 2,1 > 1,2 > 2,2 [....] > I guess the file may not be that human readable, because we use flag to > generate it automatically. I really didnt want to have to use more > advanced > lex/yacc for this. What do you think? Oh ok. I think that it will be very hard to create such a configuration file. To create the file, you will need to know which VM / RAID system is being used. I think it would be much easier to have a subsystem for each VM / RAID type and then the only thing that needs to be specified in the configuration file is the options for that type. For example, if the Linux LVM were used, then you may need to only specify the disk ordering and the block size. When reading from the image, the lvm-split-read() function would be used. >> To keep the subsystem design similar to what currently exists, have >> you >> thought about the following: >> >> A new data structure IO_INFO and before fs_open is run, the io_open() >> function is run with either the image lists or the config file etc and >> the offset. There would probably have to be one for >> io_open_files(char >> **) and io_open_config(char *). >> >> The IO_INFO structure is filed in with io_open and the needed read >> functions are mapped (like file_walk etc are now in FS_INFO). >> >> The fs_open() function gets the IO_INFO structure passed to it and the >> fs_open() no longer needs to do the open() system call on the images. >> It just checks the magic value and fills in FS_INFO. Any >> read_random() function in the file system code turns into >> fs_info->io->read_random(...). > > This is an alternative design - the advantage with your method is that > you > could potentially have a number of different subsystems in use at the > same > time in the same program, while my subsystem design keeps subsystem > data as > static so its program wide. I just didnt really want to change all the > read_random functions throughout the code (it would mean bigger > changes in > the architecture because almost every file will be touched many > times.). I have no problems changing all of the files. If we are going to add this functionality, I would rather do it right the first time. > I still think that it would be more useful to allow each subsystem to > manage > its own options, rather than trying to second guess all the options in > advance and stick them into the io_info struct. So for example rather > than > have the io_info struct have one entry for io_open_files(char **) and > io_open_config(char *), maybe we can just have an entry for void > *data, and a > single io_open(void *data), and allow the subsystem to set that to > whatever > configuration parameters make sense for it - the single file option > might > attach a char * in the data pointer, while the multifile stuff might > attach a > char **. The raid subsystem might attach a preparse linked list of its > raid > map so it can work off that. whatever makes sense. Actually, I guess we just need one io_open() function because fls.c and similar files will not know if the file is a config file or an image file. io_open would have a char ** to list the image files or config file, a type field for the type of image format, and an offset value. It would then fill in the IO_INFO structure and return it, which would be passed to fs_open(). > A couple of more types of IO subsystem i just thought of are an encase > file > format subsystem (allows you to read standard encase files with sk) > and a > compressed file subsystem (allows to work directly off compressed > files). I > have no idea how difficult it would be to actually implement those, > but they > look promising. Compression would be a major pain. Split, EnCase, and some of the RAID systems seem much easier. thanks, brian |
From: Dave <jg...@da...> - 2004-02-04 11:02:43
|
Hi all, Attached is a patch to sleuthkit to output sleuthkit filesystem data as SQL statements for entry into a database. Background: Sleuthkit fstools output are not easily machine-readable, and as such not well suited for use by front-end gui applications. A better approach is to analyse the filesystem in one pass and store all the filesystem data (about files, inodes, blocks etc) in a database system for the gui analysis program to query at will. Solution: The attached patch creates a new executable (dbtool) which basically performs the same tasks as fsstat, fls (with -r) and istat(for each inode). For those familiar with the code, it loads an image, prints the data found in FS_INFO, performs a "dent_walk", then performs an "inode_walk" and for each inode performs a "file_walk". At each stage the callback prints SQL statements which populate a database. Once the program is run, the database contains all the data about the filesystem and SQL queries can then be constructed by the frontend program to perform tasks such as timelining etc. The patch consists of two new files, dbtool.c and Makefile.dbtool, and a very small patch to fatfs_dent.c to make it print long directory names rather than short ones. dbtool can be compiled as follows: cd sleuthkit-1.67 patch -p1 < ../sleuthkit-dbtool make cd src/fstools make -f Makefile.dbtool An example of dbtool usage and output is also attached. I have also created a python module for accessing the data stored in the database which provides a "file-like" interface to perform 'open', 'read' and 'seek' operations on files within the dd image. This is how our forensics application (flag) accesses the data in the database. I can post this if anyone is interested. There will be a new release of "flag" in the very near future which incorporates this work. I welcome your comments and suggestions, Thanks, David Collett |
From: Dave <jg...@da...> - 2004-02-04 11:02:38
|
Hi all, Attached is a patch to sleuthkit to output sleuthkit filesystem data as SQL statements for entry into a database. Background: Sleuthkit fstools output are not easily machine-readable, and as such not well suited for use by front-end gui applications. A better approach is to analyse the filesystem in one pass and store all the filesystem data (about files, inodes, blocks etc) in a database system for the gui analysis program to query at will. Solution: The attached patch creates a new executable (dbtool) which basically performs the same tasks as fsstat, fls (with -r) and istat(for each inode). For those familiar with the code, it loads an image, prints the data found in FS_INFO, performs a "dent_walk", then performs an "inode_walk" and for each inode performs a "file_walk". At each stage the callback prints SQL statements which populate a database. Once the program is run, the database contains all the data about the filesystem and SQL queries can then be constructed by the frontend program to perform tasks such as timelining etc. The patch consists of two new files, dbtool.c and Makefile.dbtool, and a very small patch to fatfs_dent.c to make it print long directory names rather than short ones. dbtool can be compiled as follows: cd sleuthkit-1.67 patch -p1 < ../sleuthkit-dbtool make cd src/fstools make -f Makefile.dbtool An example of dbtool usage and output is also attached. I have also created a python module for accessing the data stored in the database which provides a "file-like" interface to perform 'open', 'read' and 'seek' operations on files within the dd image. This is how our forensics application (flag) accesses the data in the database. I can post this if anyone is interested. There will be a new release of "flag" in the very near future which incorporates this work. I welcome your comments and suggestions, Thanks, David Collett |
From: Brian C. <ca...@sl...> - 2004-02-04 05:50:45
|
Wow! This looks great! I have been meaning to incorporate the offset option for quite a while, but this is much more involved. I don't have time to look at the code in detail right now, but I have some comments from your email and a quick skim of the code. My original plan was to use the '-o' flag to specify the sector offset for the file system. I figured sectors would be easier than bytes because mmls and fdisk give you the values in sectors and almost every disk uses a 512-byte sector. This also allows people to use the offset value without the '-i' setting. I like the idea of the '-i' because it is like specifying the image type, whereas -f is specifying the file system type. I hadn't thought about getting this advanced, but it looks good. I would actually say that '-i' should only have the type and no other options. If multiple files are needed (splitting and RAID), then they should be appended to the end of the command. For example, to look at the file system at offset sector 12345, the following could be used (names are made up): Normal full image: fls -f linux-ext3 -o 12345 file1.dd or fls -f linux-ext3 -i single -o 12345 file1.dd Split Image: fls -f linux-ext3 -i split -o 12345 file1.dd file2.dd LVM RAID Image: fls -f linux-ext3 -i lvm -o 12345 lvm-config.dat MS LDM Spanning Image fls -f ntfs -i ldm-span -o 12345 ldm-config.dat It would also be useful if the config file format that you are developing for the RAID images could be used for the split images. To keep the subsystem design similar to what currently exists, have you thought about the following: A new data structure IO_INFO and before fs_open is run, the io_open() function is run with either the image lists or the config file etc and the offset. There would probably have to be one for io_open_files(char **) and io_open_config(char *). The IO_INFO structure is filed in with io_open and the needed read functions are mapped (like file_walk etc are now in FS_INFO). The fs_open() function gets the IO_INFO structure passed to it and the fs_open() no longer needs to do the open() system call on the images. It just checks the magic value and fills in FS_INFO. Any read_random() function in the file system code turns into fs_info->io->read_random(...). This looks great! brian On Feb 3, 2004, at 8:47 AM, Michael Cohen wrote: > Dear List, > Please accept this patch to the sleuthkit to implement a pluggable > IO > subsystem for the fstools. (patch against 1.67, fstools directory). > > Background > Quite often users are supplied with dd images that do not > immediately work > with sleuthkit. Two notable examples are: > - when a dd image was taken of the hdd - in this case users have to > use > sfdisk to work out the partition offsets and then use dd with > appropriate > skip parameters to extract each partition, before being able to use the > sleuthkit. This is because the sk expects to have a dd image of a > partition > (i.e. filesystem starts at offset 0 in the image file. This is not > always the > case). > - Sometimes images are split into smaller sizes for example in order > to burn > to cd/dvd etc. This means that images need to be stuck together before > analysis potentially wasting time and space. > > It would be nice if one could use the images directly - without > needing to > do creative dd manipulations. > > Solution > This patch implements a modular io subsystem approach - all > filesystem > operations within the sk are made to use this subsystem, and the user > can > choose the subsystem they want. The subsystem is responsible to > seeking into > the file and extracting data out of the dd image - how that is > implemented is > completely abstracted from the point of view of the fstools. > > The user can choose the subsystem to be used by the -i (io subsystem) > command line switch. Then a list of arguments can be passed to the > subsystem > to initialise it correctly. Once that is done, the regular sk calls > can be > made (e.g. fs_open etc). The io subsystem will take care of the > specifics of > implementation. > > This patch includes 2 subsystem modules: simple and advanced. The > simple > module is exactly the same as the old sk, while the advanced module > allows > for specifying offsets into the dd file, as well as multiple dd files > in > sequence. > > Example: > As an example the fls and icat tools were modified to support the new > sub > system, more tools will be converted tomorrow once i get some sleep. > Example > of how to seek into a partition within a disk dd: > > fls -i advanced -o offset=524288 -f linux-ext2 test.dd > > This selects the advanced io subsystem and passes it the offset option > specifying 1024 blocks of 512 bytes. > > Now we can split the dd image across multiple files (maybe using the > split > utility), and still analyse them at once: > > fls -i advanced -o offset=524288,file=xaa,file=xab,file=xac,file=xad > -f > linux-ext2 xae > > Note that xae (the last part of the image will be appened to the list > of > parts automatically). Also note that all the options in -o are passed > as one > parameter to the subsystem which then parses them into the relevant > arguements. > > If the subsystems name is not found, the subsystem will list all > known > subsystems: > > bash# fls -i help -f linux-ext2 test.dd > > Available Subsystems: > > standard - Standard Sleuthkit IO Subsystem > advanced - Advanced Sleuthkit IO Subsystem > fls: Could not set io subsystem help > > To get more help about the options available, try setting an option > which > is not supported: > > bash# fls -i advanced -o help -f linux-ext2 test.dd > > option help not recognised > > Advanced io subsystem options > > offset=bytes Number of bytes to seek to in the > image file. > Useful if there is some extra data at the start of the dd image (e.g. > partition table/other partitions > file=filename Filename to use for split files. If > your dd > image is split across many files, specify this parameter in the order > required as many times as needed for seemless integration > > Future work: > I am in the process of implementing a raid reassembly functionality. > I.e. > given a raid reconstruction map (a file telling sk the order in which > raid > blocks go together) and a list of dd images of individual drives, the > io > subsystem will transparently reassemble the logical data. I have a > working > prototype so i know its possible. The abstracted io subsystem concept > will be > very handy for that. > <fstools_diff> |