Re: [sleuthkit-users] Slow Add Image Process Cause
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2014-05-02 02:24:16
|
Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. brian On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. > > Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. > > Other tests resulted in: > 182.256 cache misses from 433.321 files (ntfs) > 892.359 misses from 1.811.393 files (ntfs) > 169.819 misses from 3.177.917 files (hfs) > > Luis Nassif > > > > 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > Forgot to mention: we are using sleuthkit 4.1.3 > > Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > > Hi Brian, > > The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. > > Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > Hi Luis, > > What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. > > In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. > > Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: > 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. > 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? > 3) Open the DB in a SQLite tool and do something like this: > > SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > Is it in the DB? > > Thanks! > > brian > > > On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > Hi, > > > > We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > > > > We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > > > > For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > > > > number of files / default load_db time / patched load_db time > > ~80.000 / 20min / 2min > > ~300.000 / 3h / 7min > > ~700.000 / 48h / 27min > > > > We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > unparalleled scalability from the best Selenium testing platform available. > > Simple to use. Nothing to install. Get started now for free." > > http://p.sf.net/sfu/SauceLabs_______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |