sleuthkit-users Mailing List for The Sleuth Kit (Page 43)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Fixing my last email, the test was run with the indexes AND Brian's fix.
Then I removed the index patch and loadDb took the same 1 hour to finish
with only Brian's fix. So the index patch did not help improving database
look up for parent_id.

Sorry for mistake,
Nassif

2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:

> I tested loadDb with a create index on meta_addr and fs_obj_id patch. The
> image with 433.321 files, previously taking 2h45min to load, now takes 1h
> to finish loadDb with the indexes. That is a good speed up, but completely
> disabling the database parent_id look up, it only takes 7min to finish. Is
> there another thing we can do to improve the parent_id database look up?
>
> Regards,
> Nassif
>
>
> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:
>
> Ok, tested in 2 images. Fix resolved a lot of misses:
>>
>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses
>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses
>>
>> I also think creating an index on tsk_files(meta_addr) and
>> tsk_files(fs_obj_id) could help improving the database look up for those
>> deleted files not found in local cache, what do you think? The database
>> look up seems too slow, as described in my first email.
>>
>> Thank you for taking a look so quickly.
>> Nassif
>>
>>
>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>:
>>
>> Well that was an easy and embarrassing fix:
>>>
>>>      if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) {
>>> -        seq = fs_file->name->meta_seq;
>>> +        seq = fs_file->name->par_seq;
>>>      }
>>>
>>> Turns out we've been having  a lot of cache misses because of this
>>> stupid bug. Can you replace that line and see if it helps.  It certainly
>>> did on my test image.
>>>
>>> thanks,
>>> brian
>>>
>>>
>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...>
>>> wrote:
>>>
>>> > Thanks for the tests.  I wonder if it has to do with an incorrect
>>> sequence number. NTFS increments the sequence number each time a file is
>>> re-allocated. Deleted orphan files could be getting misses.  I'll add some
>>> logging on my system and see what kind of misses I get.
>>> >
>>> > brian
>>> >
>>> > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...>
>>> wrote:
>>> >
>>> >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide,
>>> so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId()
>>> to return the parent_meta_addr when it is not found and return 1 when it is
>>> found in the cache map.
>>> >>
>>> >> Performing queries on the generated sqlite, there were 19.558 cache
>>> misses from an image with 3 ntfs partitions and 127.408 files. I confirmed
>>> that many parent_meta_addr missed from cache (now stored in
>>> tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths
>>> corresponding to these meta_addr are parents of those files whose
>>> processing have not found them in cache.
>>> >>
>>> >> Other tests resulted in:
>>> >> 182.256 cache misses from 433.321 files (ntfs)
>>> >> 892.359 misses from 1.811.393 files (ntfs)
>>> >> 169.819 misses from 3.177.917 files (hfs)
>>> >>
>>> >> Luis Nassif
>>> >>
>>> >>
>>> >>
>>> >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:
>>> >> Forgot to mention: we are using sleuthkit 4.1.3
>>> >>
>>> >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...>
>>> escreveu:
>>> >>
>>> >> Hi Brian,
>>> >>
>>> >> The 3 cases above were ntfs. I also tested with hfs and canceled
>>> loaddb after 1 day. The modified version finished after 8hours and added
>>> about 3 million entries. We will try to do the tests you have suggested.
>>> >>
>>> >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...>
>>> escreveu:
>>> >> Hi Luis,
>>> >>
>>> >> What kind of file system was it? I fixed a bug a little while ago in
>>> that code for HFS file systems that resulted in a lot of cache misses.
>>> >>
>>> >> In theory, everything should be cached.  It sounds like a bug if you
>>> are getting so many misses.  The basic idea of this code is that everything
>>> in the DB gets assigned a unique object ID and we make associations between
>>> files and their parent folder's unique ID.
>>> >>
>>> >> Since you seem to be comfortable with a debugger in the code, can you
>>> set a breakpoint for when the miss happens and:
>>> >> 1) Determine the path of the file that was being added to the DB and
>>> the parent address that was trying to be found.
>>> >> 2) Use the 'ffind' TSK tool to then map that parent address to a
>>> path.  Is it a subset of the path from #1?
>>> >> 3) Open the DB in a SQLite tool and do something like this:
>>> >>
>>> >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE
>>> >>
>>> >> Is it in the DB?
>>> >>
>>> >> Thanks!
>>> >>
>>> >> brian
>>> >>
>>> >>
>>> >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...>
>>> wrote:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> We have investigated a bit why the add image process is too slow in
>>> some cases. The add image process time seems to be quadratic with the
>>> number of files in the image.
>>> >>>
>>> >>> We detected that the function TskDbSqlite::findParObjId(), in
>>> db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id
>>> mapping in the local cache for a lot of files, causing it to search for the
>>> mapping in the database (not sure if it is an non-indexed search?)
>>> >>>
>>> >>> For testing purposes, we added a "return 1;" line right after the
>>> cache look up, disabling the database look up, and this resulted in great
>>> speed ups:
>>> >>>
>>> >>> number of files / default load_db time / patched load_db time
>>> >>> ~80.000 / 20min / 2min
>>> >>> ~300.000 / 3h / 7min
>>> >>> ~700.000 / 48h / 27min
>>> >>>
>>> >>> We wonder if it is possible to store all par_meta_addr -> par_id
>>> mappings into local cache (better) or doing an improved (indexed?) search
>>> for the mapping in the database. We think that someone with more knowledge
>>> of load_db code could help a lot here.
>>> >>>
>>> ------------------------------------------------------------------------------
>>> >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For
>>> FREE
>>> >>> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>>> >>> unparalleled scalability from the best Selenium testing platform
>>> available.
>>> >>> Simple to use. Nothing to install. Get started now for free."
>>> >>>
>>> http://p.sf.net/sfu/SauceLabs_______________________________________________
>>> >>> sleuthkit-users mailing list
>>> >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
>>> >>> http://www.sleuthkit.org
>>> >>
>>> >>
>>> >>
>>> ------------------------------------------------------------------------------
>>> >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> >> Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>>> >> unparalleled scalability from the best Selenium testing platform
>>> available.
>>> >> Simple to use. Nothing to install. Get started now for free."
>>> >>
>>> http://p.sf.net/sfu/SauceLabs_______________________________________________
>>> >> sleuthkit-users mailing list
>>> >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
>>> >> http://www.sleuthkit.org
>>> >
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
>>> > Instantly run your Selenium tests across 300+ browser/OS combos.  Get
>>> > unparalleled scalability from the best Selenium testing platform
>>> available.
>>> > Simple to use. Nothing to install. Get started now for free."
>>> > http://p.sf.net/sfu/SauceLabs
>>> > _______________________________________________
>>> > sleuthkit-users mailing list
>>> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
>>> > http://www.sleuthkit.org
>>>
>>>
>>
>

2002	Jan	Feb	Mar	Apr	May	Jun	Jul (6)	Aug	Sep (11)	Oct (5)	Nov (4)	Dec
2003	Jan (1)	Feb (20)	Mar (60)	Apr (40)	May (24)	Jun (28)	Jul (18)	Aug (27)	Sep (6)	Oct (14)	Nov (15)	Dec (22)
2004	Jan (34)	Feb (13)	Mar (28)	Apr (23)	May (27)	Jun (26)	Jul (37)	Aug (19)	Sep (20)	Oct (39)	Nov (17)	Dec (9)
2005	Jan (45)	Feb (43)	Mar (66)	Apr (36)	May (19)	Jun (64)	Jul (10)	Aug (11)	Sep (35)	Oct (6)	Nov (4)	Dec (13)
2006	Jan (52)	Feb (34)	Mar (39)	Apr (39)	May (37)	Jun (15)	Jul (13)	Aug (48)	Sep (9)	Oct (10)	Nov (47)	Dec (13)
2007	Jan (25)	Feb (4)	Mar (2)	Apr (29)	May (11)	Jun (19)	Jul (13)	Aug (15)	Sep (30)	Oct (12)	Nov (10)	Dec (13)
2008	Jan (2)	Feb (54)	Mar (58)	Apr (43)	May (10)	Jun (27)	Jul (25)	Aug (27)	Sep (48)	Oct (69)	Nov (55)	Dec (43)
2009	Jan (26)	Feb (36)	Mar (28)	Apr (27)	May (55)	Jun (9)	Jul (19)	Aug (16)	Sep (15)	Oct (17)	Nov (70)	Dec (21)
2010	Jan (56)	Feb (59)	Mar (53)	Apr (32)	May (25)	Jun (31)	Jul (36)	Aug (11)	Sep (37)	Oct (19)	Nov (23)	Dec (6)
2011	Jan (21)	Feb (20)	Mar (30)	Apr (30)	May (74)	Jun (50)	Jul (34)	Aug (34)	Sep (12)	Oct (33)	Nov (10)	Dec (8)
2012	Jan (23)	Feb (57)	Mar (26)	Apr (14)	May (27)	Jun (27)	Jul (60)	Aug (88)	Sep (13)	Oct (36)	Nov (97)	Dec (85)
2013	Jan (60)	Feb (24)	Mar (43)	Apr (32)	May (22)	Jun (38)	Jul (51)	Aug (50)	Sep (76)	Oct (65)	Nov (25)	Dec (30)
2014	Jan (19)	Feb (41)	Mar (43)	Apr (28)	May (61)	Jun (12)	Jul (10)	Aug (37)	Sep (76)	Oct (31)	Nov (41)	Dec (12)
2015	Jan (33)	Feb (28)	Mar (53)	Apr (22)	May (29)	Jun (20)	Jul (15)	Aug (17)	Sep (52)	Oct (3)	Nov (18)	Dec (21)
2016	Jan (20)	Feb (8)	Mar (21)	Apr (7)	May (13)	Jun (35)	Jul (34)	Aug (11)	Sep (14)	Oct (22)	Nov (31)	Dec (23)
2017	Jan (20)	Feb (7)	Mar (5)	Apr (6)	May (6)	Jun (22)	Jul (11)	Aug (16)	Sep (8)	Oct (1)	Nov (1)	Dec (1)
2018	Jan	Feb	Mar (16)	Apr (2)	May (6)	Jun (5)	Jul	Aug (2)	Sep (4)	Oct	Nov (16)	Dec (13)
2019	Jan	Feb (1)	Mar (25)	Apr (9)	May (2)	Jun (1)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2020	Jan (2)	Feb	Mar (1)	Apr	May (1)	Jun (3)	Jul (2)	Aug	Sep	Oct (5)	Nov	Dec
2021	Jan	Feb	Mar (1)	Apr	May	Jun (4)	Jul (1)	Aug	Sep (1)	Oct	Nov (1)	Dec
2022	Jan	Feb (2)	Mar	Apr	May (2)	Jun	Jul (3)	Aug	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2024	Jan	Feb (3)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

sleuthkit-users Mailing List for The Sleuth Kit (Page 43)

sleuthkit-users — List to discuss Autopsy and The Sleuth Kit.