RE: [sleuthkit-developers] Re: IO Subsystem patch for fstools

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Michael, (Again! ;-) We should start to call on the phone! ;-))

<Snip about seek performance>

> I dont find any difference really???? I thought the only=20
> overhead in a seek is=20
> a system call because the kernel would already have that in=20
> the disk cache=20
> and doesnt really need to seek the physical disk at all.=20
> Maybe its an OS=20
> thing? Im using kernel 2.4.21.
You're right.. It is not as bad as it used to be...
(Partly because of the way my code now works)...

I will implement everything using fs_read_random ;-)..
Problem solved! hehe...

> Cool, so when do you actually do the reading of the blocks?=20
> Or do you just use=20
> the file_walk and inode_walk to find out if a string is in a=20
> file or out of=20
> the file without reading any blocks?
See below!... All actual reading is done if I find a read fragment
(Fragment is two non-sequential blocks).

> > The raw mode (The real mode) will use the 64kb blocks in a
> > "walking buffer" kind of way... Every time a new block is loaded,
> > the last xx (25) bytes of the old block will be prepended and also
> > indexed... That way no data will ever get missed...
>=20
> Thats great. I also noticed (I only have the current version=20
> which is on the=20
> web site - without all the bells and wistles) that the buffer is user=20
> settable, so if seeking proves to be too much of a problem,=20
> users can just=20
> set the rolling buffer to be really large.
Indeed ;-))...

The default will be larger if that proves to be better!...

> Cool that sounds very promising. I am looking forward to=20
> seeing the next=20
> release, in the meantime I shall play with the current=20
> release. Are there=20
> many large changes in the new release?

The largest change is the support for fragmented strings (Strings
located on two non sequential blocks). Furthermore the internal
format has changed so almost twice the amount of "data" can be stored
in memory... On disk only a small 15% increase has been booked
(Storage-wise), but becuase more can be stored in memory, less redundant
parts are stored, so the total profit on disk storage can be around =
33%..

Storage has also changed in the way that you only have to specify a
directory and the rest will be handled (So no more specifying a config =
file,
and a index file (Or multiple index files))...

Version support to recognize older Index files and not blindly use them.

A few handy tools for checking index files and such....

Owh.. And I'm currently busy changing my code to use the fstools =
fs_read_random()
function and not use libc's fread() ;-)

Paul Bakker