Re: [sleuthkit-developers] Re: IO Subsystem patch for fstools
Brought to you by:
carrier
From: Michael C. <mic...@ne...> - 2004-02-23 09:42:13
|
Hi Paul, > Well I do see advantages.... I already wanted to ask this... > > The problem with the current code is that it is not possible to > "read_random" an image efficiently because it cannot check the current > offset in the image.. This results in unnecessary seeks.. And seeks are > very expensive if they come in millions.... I agree - this is particularly bad if the underlying image is a compressed format like encase or sgzip because then each seek/read corresponds to a decompression of at least one block. > For Indexed Searching it would be very handy if their would come either: a > generic fs_read_random() function. > > If this function would check for the current offset in the image and thus > not seek if the reads where all in succession, whis would be great... This really depends on the specific subsystem, for example when reading an encase file you need to decompress at least one chunk for each seek so if you read lots of little runs of data all over the file its gonna run slow. The solution to this problem, i think, is to implement some kind of caching in memory. A cache system can solve all those problems very efficiently, particularly for the case where you make lots of small reads, very close together (i.e. no seeks). A simple cache (with a simple policy) can be implemented quite easily i think, and will be effective for the scenario you are describing. What kind of IO do you do for indexing? Is it very localised? If you were to cache a block into memory, what would be the optimal size of the block? (say 1 mb or more like 32kb?) If you were to cache 1 mb in memory, how many reads would you get out of it on average? Michael |