Re: [sleuthkit-users] Slow Add Image Process Cause
Brought to you by:
carrier
From: Brian C. <ca...@sl...> - 2014-05-01 18:48:57
|
Hi Luis, What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? 3) Open the DB in a SQLite tool and do something like this: SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE Is it in the DB? Thanks! brian On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > Hi, > > We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > > We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > > For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > > number of files / default load_db time / patched load_db time > ~80.000 / 20min / 2min > ~300.000 / 3h / 7min > ~700.000 / 48h / 27min > > We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |