sleuthkit-users Mailing List for The Sleuth Kit (Page 42)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Brian,

Pushed latest version of develop. Tested with the same 3 ntfs images and 1
hfs w/ 3million entries. No misses found and loadDd finished very fast.

Thanks,
Luis Nassif

2014-05-14 16:20 GMT-03:00 Brian Carrier <ca...@sl...>:

> Hi Luis,
>
> Pull the latest version of develop.   I'm not seeing any cache misses on
> any images and the regression tests are all good.
>
> For reference on what has gone on in this thread:
> - NTFS has a sequence value that it increments each time that an MFT entry
> is re-used.
> - TSK (and other tools) used to ignore this value, but that changed about
> a year ago when a law enforcement person showed me some data where half of
> the tools showed an important file in one folder and another half showed it
> in a different folder.  The difference was because some tools were ignoring
> the sequence and putting the file in a less than accurate place.
> - The tsk_loaddb tool (which is what Autopsy basically uses to make the
> SQLite Database) has a caching mechanism to find information about
> previously analyzed files so that it doesn't have to query the database
> each time, which is slow.
> - This exercise brought to light two bugs.  One was that the wrong ID was
> being used in the cache lookup and it never hit.  The other is even after
> that was fixed, some deleted files were still not being found in the cache
> because the wrong sequence value was being used.
> - In both cases, we weren't getting wrong results.  We were simply being
> inefficient and that is why the ingest took longer.
>
> thanks,
> brian
>
>
> On May 13, 2014, at 6:08 PM, Brian Carrier <ca...@sl...> wrote:
>
> > Hi Luis,
> >
> > Yea,  A couple of my regression tests found some strange results with
> that fix.  I pushed a new one up that is less strict on the filtering of
> the deleted files. There is still one issue I need to work out though.
> >
> > thanks,
> > brian
> >
> >
> >
> > On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...>
> wrote:
> >
> >> Forgot to mention, now loaddb is adding a different number of entries
> compared to previous version. Is it ok?
> >>
> >> Thank you
> >> Nassif
> >>
> >> Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...>
> escreveu:
> >> Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images
> above and 1 hfs with 3 million entries.
> >>
> >> Also, images taking hours or days to finish now takes minutes with both
> patches!
> >>
> >> Thank you very much for addressing this issue.
> >> Nassif
> >>
> >> Nope, this one from earlier today:
> >>
> >>
> https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702
> >>
> >>
> >> On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...>
> wrote:
> >>
> >>> Hi Brian,
> >>>
> >>> Do you mean this fix?
> https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7
> >>>
> >>> It resolved a lot of the misses as i described before (05/02). But i
> am still getting thousands of misses. So I took a look at 2 generated
> sqlites and discovered that the remaining misses were from deleted folders.
> I think that deleted folders are not being stored into local cache...
> >>>
> >>>
> >>>
> >>>
> >>> 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>:
> >>> Hi Luis,
> >>>
> >>> The develop branch on github has a fix that removed the remaining
> misses on my test image, but I had far fewer than you did.   Can you pull
> it and try that one?
> >>>
> >>> thanks,
> >>> brian
> >>>
> >>>
> >>>
> >>>
> >>> On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...>
> wrote:
> >>>
> >>>> Hi Brian and Simson,
> >>>>
> >>>> I have done a number of tests since yesterday. I have not restarted
> the computer between the tests, because i think it would be better to use
> the OS IO cache to focus on CPU processing time, without io interference,
> except for the first run. I used another computer, faster than before.
> Results below:
> >>>>
> >>>> ntfs image w/ 127.408 files   1       2       3       4
> >>>> no patch (only Brian's fix)   3min 27s        3min 3s 3min 3s 3min 2s
> >>>> disabling database parent_id look up  35s     11s     11s     12s
> >>>> index on meta_addr,fs_obj_id and commit for each 5000 files   4min
> 11s        3min 48s        3min 48s        3min 47s
> >>>>
> >>>>
> >>>> ntfs image w/ 216.693 files   1       2       3       4
> >>>> no patch (only Brian's fix)   5min 53s        4min 59s        4min
> 58s        5min 2s
> >>>> disabling database parent_id look up  2min 8s 1min 21s        1min
> 21s        1min 21s
> >>>> index on meta_addr,fs_obj_id and commit for each 5000 files   6min
> 38s        5min 46s        5min 43s        5min 43s
> >>>>
> >>>>
> >>>> ntfs image w/ 433.321 files   1       2       3       4
> >>>> no patch (only Brian's fix)   21min 38s       21min 40s       21min
> 10s       21min 10s
> >>>> disabling database parent_id look up  3min 59s        2min 47s
>  2min 47s        2min 47s
> >>>> index on meta_addr,fs_obj_id and commit for each 5000 files   (not
> run based on previous results)
> >>>>
> >>>> So, Brian was right, the commits increased processing time. And as
> you can see, it would be great if it is possible to eliminate the misses
> and remove the database parent_id look up.
> >>>>
> >>>> With that in mind, I took a look at one of the sqlites, and I think I
> discovered the cause of a lot of the misses (maybe all of them). The misses
> happened with deleted files inside deleted folders. I think deleted folders
> can not being stored into local cache.
> >>>>
> >>>> Regards,
> >>>> Nassif
> >>>>
> >>>>
> >>>> 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>:
> >>>> Hi Nassif,
> >>>>
> >>>> As Simson mentioned, the current setup was intended to be the
> fastest.  Doing frequent commits takes longer and more indexes makes
> commits take longer. This is the only process that we know about that does
> this type of lookup and would use those indexes.
> >>>>
> >>>> The bigger question for me is why we are getting these cache misses
> and I need to spend some more time with some more images to find out.  The
> lookup is to find the ID of the parent and we process from the root
> directory down.  So, in theory, we have already processed the parent folder
> before the children and it should be in the cache. We need to figure out
> why the parent isn't in the cache...
> >>>>
> >>>> brian
> >>>>
> >>>>
> >>>>
> >>>> On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...>
> wrote:
> >>>>
> >>>>> I have done one last test, because it was very strange to me that
> indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We
> suspected that the indexes were not being used by sqlite, maybe because the
> whole data is not commited before add image process finishes (i am not a
> sqlite expert, is it possible?). So we inserted a commit for each 5.000
> files added to database. The add image process time decreased from 1hour to
> 30min, so we think that the indexes were not being used.
> >>>>>
> >>>>> Why add image process do not commit the data while it is being added
> to database?
> >>>>>
> >>>>> Nassif
> >>>>>
> >>>>>
> >>>>> 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:
> >>>>> Fixing my last email, the test was run with the indexes AND Brian's
> fix. Then I removed the index patch and loadDb took the same 1 hour to
> finish with only Brian's fix. So the index patch did not help improving
> database look up for parent_id.
> >>>>>
> >>>>> Sorry for mistake,
> >>>>> Nassif
> >>>>>
> >>>>>
> >>>>> 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:
> >>>>>
> >>>>> I tested loadDb with a create index on meta_addr and fs_obj_id
> patch. The image with 433.321 files, previously taking 2h45min to load, now
> takes 1h to finish loadDb with the indexes. That is a good speed up, but
> completely disabling the database parent_id look up, it only takes 7min to
> finish. Is there another thing we can do to improve the parent_id database
> look up?
> >>>>>
> >>>>> Regards,
> >>>>> Nassif
> >>>>>
> >>>>>
> >>>>> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>:
> >>>>>
> >>>>> Ok, tested in 2 images. Fix resolved a lot of misses:
> >>>>>
> >>>>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses
> >>>>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses
> >>>>>
> >>>>> I also think creating an index on tsk_files(meta_addr) and
> tsk_files(fs_obj_id) could help improving the database look up for those
> deleted files not found in local cache, what do you think? The database
> look up seems too slow, as described in my first email.
> >>>>>
> >>>>> Thank you for taking a look so quickly.
> >>>>> Nassif
> >>>>>
> >>>>>
> >>>>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>:
> >>>>>
> >>>>> Well that was an easy and embarrassing fix:
> >>>>>
> >>>>>     if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) {
> >>>>> -        seq = fs_file->name->meta_seq;
> >>>>> +        seq = fs_file->name->par_seq;
> >>>>>     }
> >>>>>
> >>>>> Turns out we've been having  a lot of cache misses because of this
> stupid bug. Can you replace that line and see if it helps.  It certainly
> did on my test image.
> >>>>>
> >>>>> thanks,
> >>>>> brian
> >>>>>
> >>>>>
> >>>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...>
> wrote:
> >>>>>
> >>>>>> Thanks for the tests.  I wonder if it has to do with an incorrect
> sequence number. NTFS increments the sequence number each time a file is
> re-allocated. Deleted orphan files could be getting misses.  I'll add some
> logging on my system and see what kind of misses I get.
> >>>>>>
> >>>>>> brian
> >>>>>>
> >>>>>> On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...>
> wrote:
> >>>>>>
> >>>>>>> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an
> ide, so did not use breakpoints. Instead, I changed
> TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not
> found and return 1 when it is found in the cache map.
> >>>>>>>
> >>>>>>> Performing queries on the generated sqlite, there were 19.558
> cache misses from an image with 3 ntfs partitions and 127.408 files. I
> confirmed that many parent_meta_addr missed from cache (now stored in
> tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths
> corresponding to these meta_addr are parents of those files whose
> processing have not found them in cache.
> >>>>>>>
> >>>>>>> Other tests resulted in:
> >>>>>>> 182.256 cache misses from 433.321 files (ntfs)
> >>>>>>> 892.359 misses from 1.811.393 files (ntfs)
> >>>>>>> 169.819 misses from 3.177.917 files (hfs)
> >>>>>>>
> >>>>>>> Luis Nassif
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...
> >:
> >>>>>>> Forgot to mention: we are using sleuthkit 4.1.3
> >>>>>>>
> >>>>>>> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...>
> escreveu:
> >>>>>>>
> >>>>>>> Hi Brian,
> >>>>>>>
> >>>>>>> The 3 cases above were ntfs. I also tested with hfs and canceled
> loaddb after 1 day. The modified version finished after 8hours and added
> about 3 million entries. We will try to do the tests you have suggested.
> >>>>>>>
> >>>>>>> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...>
> escreveu:
> >>>>>>> Hi Luis,
> >>>>>>>
> >>>>>>> What kind of file system was it? I fixed a bug a little while ago
> in that code for HFS file systems that resulted in a lot of cache misses.
> >>>>>>>
> >>>>>>> In theory, everything should be cached.  It sounds like a bug if
> you are getting so many misses.  The basic idea of this code is that
> everything in the DB gets assigned a unique object ID and we make
> associations between files and their parent folder's unique ID.
> >>>>>>>
> >>>>>>> Since you seem to be comfortable with a debugger in the code, can
> you set a breakpoint for when the miss happens and:
> >>>>>>> 1) Determine the path of the file that was being added to the DB
> and the parent address that was trying to be found.
> >>>>>>> 2) Use the 'ffind' TSK tool to then map that parent address to a
> path.  Is it a subset of the path from #1?
> >>>>>>> 3) Open the DB in a SQLite tool and do something like this:
> >>>>>>>
> >>>>>>> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE
> >>>>>>>
> >>>>>>> Is it in the DB?
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>> brian
> >>>>>>>
> >>>>>>>
> >>>>>>> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <
> lfc...@gm...> wrote:
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> We have investigated a bit why the add image process is too slow
> in some cases. The add image process time seems to be quadratic with the
> number of files in the image.
> >>>>>>>>
> >>>>>>>> We detected that the function TskDbSqlite::findParObjId(), in
> db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id
> mapping in the local cache for a lot of files, causing it to search for the
> mapping in the database (not sure if it is an non-indexed search?)
> >>>>>>>>
> >>>>>>>> For testing purposes, we added a "return 1;" line right after the
> cache look up, disabling the database look up, and this resulted in great
> speed ups:
> >>>>>>>>
> >>>>>>>> number of files / default load_db time / patched load_db time
> >>>>>>>> ~80.000 / 20min / 2min
> >>>>>>>> ~300.000 / 3h / 7min
> >>>>>>>> ~700.000 / 48h / 27min
> >>>>>>>>
> >>>>>>>> We wonder if it is possible to store all par_meta_addr -> par_id
> mappings into local cache (better) or doing an improved (indexed?) search
> for the mapping in the database. We think that someone with more knowledge
> of load_db code could help a lot here.
> >>>>>>>>
> ------------------------------------------------------------------------------
> >>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For
> FREE
> >>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>  Get
> >>>>>>>> unparalleled scalability from the best Selenium testing platform
> available.
> >>>>>>>> Simple to use. Nothing to install. Get started now for free."
> >>>>>>>>
> http://p.sf.net/sfu/SauceLabs_______________________________________________
> >>>>>>>> sleuthkit-users mailing list
> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >>>>>>>> http://www.sleuthkit.org
> >>>>>>>
> >>>>>>>
> >>>>>>>
> ------------------------------------------------------------------------------
> >>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For
> FREE
> >>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>  Get
> >>>>>>> unparalleled scalability from the best Selenium testing platform
> available.
> >>>>>>> Simple to use. Nothing to install. Get started now for free."
> >>>>>>>
> http://p.sf.net/sfu/SauceLabs_______________________________________________
> >>>>>>> sleuthkit-users mailing list
> >>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >>>>>>> http://www.sleuthkit.org
> >>>>>>
> >>>>>>
> >>>>>>
> ------------------------------------------------------------------------------
> >>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For
> FREE
> >>>>>> Instantly run your Selenium tests across 300+ browser/OS combos.
>  Get
> >>>>>> unparalleled scalability from the best Selenium testing platform
> available.
> >>>>>> Simple to use. Nothing to install. Get started now for free."
> >>>>>> http://p.sf.net/sfu/SauceLabs
> >>>>>> _______________________________________________
> >>>>>> sleuthkit-users mailing list
> >>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >>>>>> http://www.sleuthkit.org
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> ------------------------------------------------------------------------------
> >>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to
> find out:
> >>>>> &#149; 3 signs your SCM is hindering your productivity
> >>>>> &#149; Requirements for releasing software faster
> >>>>> &#149; Expert tips and advice for migrating your SCM now
> >>>>>
> http://p.sf.net/sfu/perforce_______________________________________________
> >>>>> sleuthkit-users mailing list
> >>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >>>>> http://www.sleuthkit.org
> >>>>
> >>>>
> >>>>
> ------------------------------------------------------------------------------
> >>>> Is your legacy SCM system holding you back? Join Perforce May 7 to
> find out:
> >>>> &#149; 3 signs your SCM is hindering your productivity
> >>>> &#149; Requirements for releasing software faster
> >>>> &#149; Expert tips and advice for migrating your SCM now
> >>>>
> http://p.sf.net/sfu/perforce_______________________________________________
> >>>> sleuthkit-users mailing list
> >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >>>> http://www.sleuthkit.org
> >>>
> >>>
> >>
> >>
> ------------------------------------------------------------------------------
> >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> >> Instantly run your Selenium tests across 300+ browser/OS combos.
> >> Get unparalleled scalability from the best Selenium testing platform
> available
> >> Simple to use. Nothing to install. Get started now for free."
> >>
> http://p.sf.net/sfu/SauceLabs_______________________________________________
> >> sleuthkit-users mailing list
> >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> >> http://www.sleuthkit.org
> >
> >
> >
> ------------------------------------------------------------------------------
> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
> > Instantly run your Selenium tests across 300+ browser/OS combos.
> > Get unparalleled scalability from the best Selenium testing platform
> available
> > Simple to use. Nothing to install. Get started now for free."
> > http://p.sf.net/sfu/SauceLabs
> > _______________________________________________
> > sleuthkit-users mailing list
> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users
> > http://www.sleuthkit.org
>
>

2002	Jan	Feb	Mar	Apr	May	Jun	Jul (6)	Aug	Sep (11)	Oct (5)	Nov (4)	Dec
2003	Jan (1)	Feb (20)	Mar (60)	Apr (40)	May (24)	Jun (28)	Jul (18)	Aug (27)	Sep (6)	Oct (14)	Nov (15)	Dec (22)
2004	Jan (34)	Feb (13)	Mar (28)	Apr (23)	May (27)	Jun (26)	Jul (37)	Aug (19)	Sep (20)	Oct (39)	Nov (17)	Dec (9)
2005	Jan (45)	Feb (43)	Mar (66)	Apr (36)	May (19)	Jun (64)	Jul (10)	Aug (11)	Sep (35)	Oct (6)	Nov (4)	Dec (13)
2006	Jan (52)	Feb (34)	Mar (39)	Apr (39)	May (37)	Jun (15)	Jul (13)	Aug (48)	Sep (9)	Oct (10)	Nov (47)	Dec (13)
2007	Jan (25)	Feb (4)	Mar (2)	Apr (29)	May (11)	Jun (19)	Jul (13)	Aug (15)	Sep (30)	Oct (12)	Nov (10)	Dec (13)
2008	Jan (2)	Feb (54)	Mar (58)	Apr (43)	May (10)	Jun (27)	Jul (25)	Aug (27)	Sep (48)	Oct (69)	Nov (55)	Dec (43)
2009	Jan (26)	Feb (36)	Mar (28)	Apr (27)	May (55)	Jun (9)	Jul (19)	Aug (16)	Sep (15)	Oct (17)	Nov (70)	Dec (21)
2010	Jan (56)	Feb (59)	Mar (53)	Apr (32)	May (25)	Jun (31)	Jul (36)	Aug (11)	Sep (37)	Oct (19)	Nov (23)	Dec (6)
2011	Jan (21)	Feb (20)	Mar (30)	Apr (30)	May (74)	Jun (50)	Jul (34)	Aug (34)	Sep (12)	Oct (33)	Nov (10)	Dec (8)
2012	Jan (23)	Feb (57)	Mar (26)	Apr (14)	May (27)	Jun (27)	Jul (60)	Aug (88)	Sep (13)	Oct (36)	Nov (97)	Dec (85)
2013	Jan (60)	Feb (24)	Mar (43)	Apr (32)	May (22)	Jun (38)	Jul (51)	Aug (50)	Sep (76)	Oct (65)	Nov (25)	Dec (30)
2014	Jan (19)	Feb (41)	Mar (43)	Apr (28)	May (61)	Jun (12)	Jul (10)	Aug (37)	Sep (76)	Oct (31)	Nov (41)	Dec (12)
2015	Jan (33)	Feb (28)	Mar (53)	Apr (22)	May (29)	Jun (20)	Jul (15)	Aug (17)	Sep (52)	Oct (3)	Nov (18)	Dec (21)
2016	Jan (20)	Feb (8)	Mar (21)	Apr (7)	May (13)	Jun (35)	Jul (34)	Aug (11)	Sep (14)	Oct (22)	Nov (31)	Dec (23)
2017	Jan (20)	Feb (7)	Mar (5)	Apr (6)	May (6)	Jun (22)	Jul (11)	Aug (16)	Sep (8)	Oct (1)	Nov (1)	Dec (1)
2018	Jan	Feb	Mar (16)	Apr (2)	May (6)	Jun (5)	Jul	Aug (2)	Sep (4)	Oct	Nov (16)	Dec (13)
2019	Jan	Feb (1)	Mar (25)	Apr (9)	May (2)	Jun (1)	Jul (1)	Aug	Sep	Oct	Nov	Dec
2020	Jan (2)	Feb	Mar (1)	Apr	May (1)	Jun (3)	Jul (2)	Aug	Sep	Oct (5)	Nov	Dec
2021	Jan	Feb	Mar (1)	Apr	May	Jun (4)	Jul (1)	Aug	Sep (1)	Oct	Nov (1)	Dec
2022	Jan	Feb (2)	Mar	Apr	May (2)	Jun	Jul (3)	Aug	Sep	Oct	Nov	Dec
2023	Jan (2)	Feb	Mar (1)	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2024	Jan	Feb (3)	Mar	Apr (1)	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec
2025	Jan	Feb	Mar	Apr	May	Jun	Jul (1)	Aug	Sep	Oct	Nov	Dec

sleuthkit-users Mailing List for The Sleuth Kit (Page 42)

sleuthkit-users — List to discuss Autopsy and The Sleuth Kit.