Re: [sleuthkit-users] Slow Add Image Process Cause
Brought to you by:
carrier
From: Luís F. N. <lfc...@gm...> - 2014-05-01 19:14:22
|
Forgot to mention: we are using sleuthkit 4.1.3 Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > Hi Brian, > > The 3 cases above were ntfs. I also tested with hfs and canceled loaddb > after 1 day. The modified version finished after 8hours and added about 3 > million entries. We will try to do the tests you have suggested. > Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > >> Hi Luis, >> >> What kind of file system was it? I fixed a bug a little while ago in that >> code for HFS file systems that resulted in a lot of cache misses. >> >> In theory, everything should be cached. It sounds like a bug if you are >> getting so many misses. The basic idea of this code is that everything in >> the DB gets assigned a unique object ID and we make associations between >> files and their parent folder's unique ID. >> >> Since you seem to be comfortable with a debugger in the code, can you set >> a breakpoint for when the miss happens and: >> 1) Determine the path of the file that was being added to the DB and the >> parent address that was trying to be found. >> 2) Use the 'ffind' TSK tool to then map that parent address to a path. >> Is it a subset of the path from #1? >> 3) Open the DB in a SQLite tool and do something like this: >> >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE >> >> Is it in the DB? >> >> Thanks! >> >> brian >> >> >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> >> wrote: >> >> > Hi, >> > >> > We have investigated a bit why the add image process is too slow in >> some cases. The add image process time seems to be quadratic with the >> number of files in the image. >> > >> > We detected that the function TskDbSqlite::findParObjId(), in >> db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id >> mapping in the local cache for a lot of files, causing it to search for the >> mapping in the database (not sure if it is an non-indexed search?) >> > >> > For testing purposes, we added a "return 1;" line right after the cache >> look up, disabling the database look up, and this resulted in great speed >> ups: >> > >> > number of files / default load_db time / patched load_db time >> > ~80.000 / 20min / 2min >> > ~300.000 / 3h / 7min >> > ~700.000 / 48h / 27min >> > >> > We wonder if it is possible to store all par_meta_addr -> par_id >> mappings into local cache (better) or doing an improved (indexed?) search >> for the mapping in the database. We think that someone with more knowledge >> of load_db code could help a lot here. >> > >> ------------------------------------------------------------------------------ >> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> > Instantly run your Selenium tests across 300+ browser/OS combos. Get >> > unparalleled scalability from the best Selenium testing platform >> available. >> > Simple to use. Nothing to install. Get started now for free." >> > >> http://p.sf.net/sfu/SauceLabs_______________________________________________ >> > sleuthkit-users mailing list >> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> > http://www.sleuthkit.org >> >> |