sleuthkit-users Mailing List for The Sleuth Kit (Page 42)
Brought to you by:
carrier
You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(11) |
Oct
(5) |
Nov
(4) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
(1) |
Feb
(20) |
Mar
(60) |
Apr
(40) |
May
(24) |
Jun
(28) |
Jul
(18) |
Aug
(27) |
Sep
(6) |
Oct
(14) |
Nov
(15) |
Dec
(22) |
2004 |
Jan
(34) |
Feb
(13) |
Mar
(28) |
Apr
(23) |
May
(27) |
Jun
(26) |
Jul
(37) |
Aug
(19) |
Sep
(20) |
Oct
(39) |
Nov
(17) |
Dec
(9) |
2005 |
Jan
(45) |
Feb
(43) |
Mar
(66) |
Apr
(36) |
May
(19) |
Jun
(64) |
Jul
(10) |
Aug
(11) |
Sep
(35) |
Oct
(6) |
Nov
(4) |
Dec
(13) |
2006 |
Jan
(52) |
Feb
(34) |
Mar
(39) |
Apr
(39) |
May
(37) |
Jun
(15) |
Jul
(13) |
Aug
(48) |
Sep
(9) |
Oct
(10) |
Nov
(47) |
Dec
(13) |
2007 |
Jan
(25) |
Feb
(4) |
Mar
(2) |
Apr
(29) |
May
(11) |
Jun
(19) |
Jul
(13) |
Aug
(15) |
Sep
(30) |
Oct
(12) |
Nov
(10) |
Dec
(13) |
2008 |
Jan
(2) |
Feb
(54) |
Mar
(58) |
Apr
(43) |
May
(10) |
Jun
(27) |
Jul
(25) |
Aug
(27) |
Sep
(48) |
Oct
(69) |
Nov
(55) |
Dec
(43) |
2009 |
Jan
(26) |
Feb
(36) |
Mar
(28) |
Apr
(27) |
May
(55) |
Jun
(9) |
Jul
(19) |
Aug
(16) |
Sep
(15) |
Oct
(17) |
Nov
(70) |
Dec
(21) |
2010 |
Jan
(56) |
Feb
(59) |
Mar
(53) |
Apr
(32) |
May
(25) |
Jun
(31) |
Jul
(36) |
Aug
(11) |
Sep
(37) |
Oct
(19) |
Nov
(23) |
Dec
(6) |
2011 |
Jan
(21) |
Feb
(20) |
Mar
(30) |
Apr
(30) |
May
(74) |
Jun
(50) |
Jul
(34) |
Aug
(34) |
Sep
(12) |
Oct
(33) |
Nov
(10) |
Dec
(8) |
2012 |
Jan
(23) |
Feb
(57) |
Mar
(26) |
Apr
(14) |
May
(27) |
Jun
(27) |
Jul
(60) |
Aug
(88) |
Sep
(13) |
Oct
(36) |
Nov
(97) |
Dec
(85) |
2013 |
Jan
(60) |
Feb
(24) |
Mar
(43) |
Apr
(32) |
May
(22) |
Jun
(38) |
Jul
(51) |
Aug
(50) |
Sep
(76) |
Oct
(65) |
Nov
(25) |
Dec
(30) |
2014 |
Jan
(19) |
Feb
(41) |
Mar
(43) |
Apr
(28) |
May
(61) |
Jun
(12) |
Jul
(10) |
Aug
(37) |
Sep
(76) |
Oct
(31) |
Nov
(41) |
Dec
(12) |
2015 |
Jan
(33) |
Feb
(28) |
Mar
(53) |
Apr
(22) |
May
(29) |
Jun
(20) |
Jul
(15) |
Aug
(17) |
Sep
(52) |
Oct
(3) |
Nov
(18) |
Dec
(21) |
2016 |
Jan
(20) |
Feb
(8) |
Mar
(21) |
Apr
(7) |
May
(13) |
Jun
(35) |
Jul
(34) |
Aug
(11) |
Sep
(14) |
Oct
(22) |
Nov
(31) |
Dec
(23) |
2017 |
Jan
(20) |
Feb
(7) |
Mar
(5) |
Apr
(6) |
May
(6) |
Jun
(22) |
Jul
(11) |
Aug
(16) |
Sep
(8) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2018 |
Jan
|
Feb
|
Mar
(16) |
Apr
(2) |
May
(6) |
Jun
(5) |
Jul
|
Aug
(2) |
Sep
(4) |
Oct
|
Nov
(16) |
Dec
(13) |
2019 |
Jan
|
Feb
(1) |
Mar
(25) |
Apr
(9) |
May
(2) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
(3) |
Jul
(2) |
Aug
|
Sep
|
Oct
(5) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
(4) |
Jul
(1) |
Aug
|
Sep
(1) |
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
|
Feb
(2) |
Mar
|
Apr
|
May
(2) |
Jun
|
Jul
(3) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2023 |
Jan
(2) |
Feb
|
Mar
(1) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2024 |
Jan
|
Feb
(3) |
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Luís F. N. <lfc...@gm...> - 2014-05-15 18:07:40
|
Hi Brian, Pushed latest version of develop. Tested with the same 3 ntfs images and 1 hfs w/ 3million entries. No misses found and loadDd finished very fast. Thanks, Luis Nassif 2014-05-14 16:20 GMT-03:00 Brian Carrier <ca...@sl...>: > Hi Luis, > > Pull the latest version of develop. I'm not seeing any cache misses on > any images and the regression tests are all good. > > For reference on what has gone on in this thread: > - NTFS has a sequence value that it increments each time that an MFT entry > is re-used. > - TSK (and other tools) used to ignore this value, but that changed about > a year ago when a law enforcement person showed me some data where half of > the tools showed an important file in one folder and another half showed it > in a different folder. The difference was because some tools were ignoring > the sequence and putting the file in a less than accurate place. > - The tsk_loaddb tool (which is what Autopsy basically uses to make the > SQLite Database) has a caching mechanism to find information about > previously analyzed files so that it doesn't have to query the database > each time, which is slow. > - This exercise brought to light two bugs. One was that the wrong ID was > being used in the cache lookup and it never hit. The other is even after > that was fixed, some deleted files were still not being found in the cache > because the wrong sequence value was being used. > - In both cases, we weren't getting wrong results. We were simply being > inefficient and that is why the ingest took longer. > > thanks, > brian > > > On May 13, 2014, at 6:08 PM, Brian Carrier <ca...@sl...> wrote: > > > Hi Luis, > > > > Yea, A couple of my regression tests found some strange results with > that fix. I pushed a new one up that is less strict on the filtering of > the deleted files. There is still one issue I need to work out though. > > > > thanks, > > brian > > > > > > > > On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > >> Forgot to mention, now loaddb is adding a different number of entries > compared to previous version. Is it ok? > >> > >> Thank you > >> Nassif > >> > >> Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> > escreveu: > >> Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images > above and 1 hfs with 3 million entries. > >> > >> Also, images taking hours or days to finish now takes minutes with both > patches! > >> > >> Thank you very much for addressing this issue. > >> Nassif > >> > >> Nope, this one from earlier today: > >> > >> > https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 > >> > >> > >> On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> > wrote: > >> > >>> Hi Brian, > >>> > >>> Do you mean this fix? > https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 > >>> > >>> It resolved a lot of the misses as i described before (05/02). But i > am still getting thousands of misses. So I took a look at 2 generated > sqlites and discovered that the remaining misses were from deleted folders. > I think that deleted folders are not being stored into local cache... > >>> > >>> > >>> > >>> > >>> 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: > >>> Hi Luis, > >>> > >>> The develop branch on github has a fix that removed the remaining > misses on my test image, but I had far fewer than you did. Can you pull > it and try that one? > >>> > >>> thanks, > >>> brian > >>> > >>> > >>> > >>> > >>> On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > >>> > >>>> Hi Brian and Simson, > >>>> > >>>> I have done a number of tests since yesterday. I have not restarted > the computer between the tests, because i think it would be better to use > the OS IO cache to focus on CPU processing time, without io interference, > except for the first run. I used another computer, faster than before. > Results below: > >>>> > >>>> ntfs image w/ 127.408 files 1 2 3 4 > >>>> no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s > >>>> disabling database parent_id look up 35s 11s 11s 12s > >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 4min > 11s 3min 48s 3min 48s 3min 47s > >>>> > >>>> > >>>> ntfs image w/ 216.693 files 1 2 3 4 > >>>> no patch (only Brian's fix) 5min 53s 4min 59s 4min > 58s 5min 2s > >>>> disabling database parent_id look up 2min 8s 1min 21s 1min > 21s 1min 21s > >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 6min > 38s 5min 46s 5min 43s 5min 43s > >>>> > >>>> > >>>> ntfs image w/ 433.321 files 1 2 3 4 > >>>> no patch (only Brian's fix) 21min 38s 21min 40s 21min > 10s 21min 10s > >>>> disabling database parent_id look up 3min 59s 2min 47s > 2min 47s 2min 47s > >>>> index on meta_addr,fs_obj_id and commit for each 5000 files (not > run based on previous results) > >>>> > >>>> So, Brian was right, the commits increased processing time. And as > you can see, it would be great if it is possible to eliminate the misses > and remove the database parent_id look up. > >>>> > >>>> With that in mind, I took a look at one of the sqlites, and I think I > discovered the cause of a lot of the misses (maybe all of them). The misses > happened with deleted files inside deleted folders. I think deleted folders > can not being stored into local cache. > >>>> > >>>> Regards, > >>>> Nassif > >>>> > >>>> > >>>> 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > >>>> Hi Nassif, > >>>> > >>>> As Simson mentioned, the current setup was intended to be the > fastest. Doing frequent commits takes longer and more indexes makes > commits take longer. This is the only process that we know about that does > this type of lookup and would use those indexes. > >>>> > >>>> The bigger question for me is why we are getting these cache misses > and I need to spend some more time with some more images to find out. The > lookup is to find the ID of the parent and we process from the root > directory down. So, in theory, we have already processed the parent folder > before the children and it should be in the cache. We need to figure out > why the parent isn't in the cache... > >>>> > >>>> brian > >>>> > >>>> > >>>> > >>>> On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > >>>> > >>>>> I have done one last test, because it was very strange to me that > indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We > suspected that the indexes were not being used by sqlite, maybe because the > whole data is not commited before add image process finishes (i am not a > sqlite expert, is it possible?). So we inserted a commit for each 5.000 > files added to database. The add image process time decreased from 1hour to > 30min, so we think that the indexes were not being used. > >>>>> > >>>>> Why add image process do not commit the data while it is being added > to database? > >>>>> > >>>>> Nassif > >>>>> > >>>>> > >>>>> 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > >>>>> Fixing my last email, the test was run with the indexes AND Brian's > fix. Then I removed the index patch and loadDb took the same 1 hour to > finish with only Brian's fix. So the index patch did not help improving > database look up for parent_id. > >>>>> > >>>>> Sorry for mistake, > >>>>> Nassif > >>>>> > >>>>> > >>>>> 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > >>>>> > >>>>> I tested loadDb with a create index on meta_addr and fs_obj_id > patch. The image with 433.321 files, previously taking 2h45min to load, now > takes 1h to finish loadDb with the indexes. That is a good speed up, but > completely disabling the database parent_id look up, it only takes 7min to > finish. Is there another thing we can do to improve the parent_id database > look up? > >>>>> > >>>>> Regards, > >>>>> Nassif > >>>>> > >>>>> > >>>>> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > >>>>> > >>>>> Ok, tested in 2 images. Fix resolved a lot of misses: > >>>>> > >>>>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > >>>>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > >>>>> > >>>>> I also think creating an index on tsk_files(meta_addr) and > tsk_files(fs_obj_id) could help improving the database look up for those > deleted files not found in local cache, what do you think? The database > look up seems too slow, as described in my first email. > >>>>> > >>>>> Thank you for taking a look so quickly. > >>>>> Nassif > >>>>> > >>>>> > >>>>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > >>>>> > >>>>> Well that was an easy and embarrassing fix: > >>>>> > >>>>> if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > >>>>> - seq = fs_file->name->meta_seq; > >>>>> + seq = fs_file->name->par_seq; > >>>>> } > >>>>> > >>>>> Turns out we've been having a lot of cache misses because of this > stupid bug. Can you replace that line and see if it helps. It certainly > did on my test image. > >>>>> > >>>>> thanks, > >>>>> brian > >>>>> > >>>>> > >>>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> > wrote: > >>>>> > >>>>>> Thanks for the tests. I wonder if it has to do with an incorrect > sequence number. NTFS increments the sequence number each time a file is > re-allocated. Deleted orphan files could be getting misses. I'll add some > logging on my system and see what kind of misses I get. > >>>>>> > >>>>>> brian > >>>>>> > >>>>>> On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> > wrote: > >>>>>> > >>>>>>> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an > ide, so did not use breakpoints. Instead, I changed > TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not > found and return 1 when it is found in the cache map. > >>>>>>> > >>>>>>> Performing queries on the generated sqlite, there were 19.558 > cache misses from an image with 3 ntfs partitions and 127.408 files. I > confirmed that many parent_meta_addr missed from cache (now stored in > tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths > corresponding to these meta_addr are parents of those files whose > processing have not found them in cache. > >>>>>>> > >>>>>>> Other tests resulted in: > >>>>>>> 182.256 cache misses from 433.321 files (ntfs) > >>>>>>> 892.359 misses from 1.811.393 files (ntfs) > >>>>>>> 169.819 misses from 3.177.917 files (hfs) > >>>>>>> > >>>>>>> Luis Nassif > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm... > >: > >>>>>>> Forgot to mention: we are using sleuthkit 4.1.3 > >>>>>>> > >>>>>>> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> > escreveu: > >>>>>>> > >>>>>>> Hi Brian, > >>>>>>> > >>>>>>> The 3 cases above were ntfs. I also tested with hfs and canceled > loaddb after 1 day. The modified version finished after 8hours and added > about 3 million entries. We will try to do the tests you have suggested. > >>>>>>> > >>>>>>> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> > escreveu: > >>>>>>> Hi Luis, > >>>>>>> > >>>>>>> What kind of file system was it? I fixed a bug a little while ago > in that code for HFS file systems that resulted in a lot of cache misses. > >>>>>>> > >>>>>>> In theory, everything should be cached. It sounds like a bug if > you are getting so many misses. The basic idea of this code is that > everything in the DB gets assigned a unique object ID and we make > associations between files and their parent folder's unique ID. > >>>>>>> > >>>>>>> Since you seem to be comfortable with a debugger in the code, can > you set a breakpoint for when the miss happens and: > >>>>>>> 1) Determine the path of the file that was being added to the DB > and the parent address that was trying to be found. > >>>>>>> 2) Use the 'ffind' TSK tool to then map that parent address to a > path. Is it a subset of the path from #1? > >>>>>>> 3) Open the DB in a SQLite tool and do something like this: > >>>>>>> > >>>>>>> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > >>>>>>> > >>>>>>> Is it in the DB? > >>>>>>> > >>>>>>> Thanks! > >>>>>>> > >>>>>>> brian > >>>>>>> > >>>>>>> > >>>>>>> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif < > lfc...@gm...> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> We have investigated a bit why the add image process is too slow > in some cases. The add image process time seems to be quadratic with the > number of files in the image. > >>>>>>>> > >>>>>>>> We detected that the function TskDbSqlite::findParObjId(), in > db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id > mapping in the local cache for a lot of files, causing it to search for the > mapping in the database (not sure if it is an non-indexed search?) > >>>>>>>> > >>>>>>>> For testing purposes, we added a "return 1;" line right after the > cache look up, disabling the database look up, and this resulted in great > speed ups: > >>>>>>>> > >>>>>>>> number of files / default load_db time / patched load_db time > >>>>>>>> ~80.000 / 20min / 2min > >>>>>>>> ~300.000 / 3h / 7min > >>>>>>>> ~700.000 / 48h / 27min > >>>>>>>> > >>>>>>>> We wonder if it is possible to store all par_meta_addr -> par_id > mappings into local cache (better) or doing an improved (indexed?) search > for the mapping in the database. We think that someone with more knowledge > of load_db code could help a lot here. > >>>>>>>> > ------------------------------------------------------------------------------ > >>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > >>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. > Get > >>>>>>>> unparalleled scalability from the best Selenium testing platform > available. > >>>>>>>> Simple to use. Nothing to install. Get started now for free." > >>>>>>>> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > >>>>>>>> sleuthkit-users mailing list > >>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>>>>>> http://www.sleuthkit.org > >>>>>>> > >>>>>>> > >>>>>>> > ------------------------------------------------------------------------------ > >>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > >>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. > Get > >>>>>>> unparalleled scalability from the best Selenium testing platform > available. > >>>>>>> Simple to use. Nothing to install. Get started now for free." > >>>>>>> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > >>>>>>> sleuthkit-users mailing list > >>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>>>>> http://www.sleuthkit.org > >>>>>> > >>>>>> > >>>>>> > ------------------------------------------------------------------------------ > >>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > >>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. > Get > >>>>>> unparalleled scalability from the best Selenium testing platform > available. > >>>>>> Simple to use. Nothing to install. Get started now for free." > >>>>>> http://p.sf.net/sfu/SauceLabs > >>>>>> _______________________________________________ > >>>>>> sleuthkit-users mailing list > >>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>>>> http://www.sleuthkit.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to > find out: > >>>>> • 3 signs your SCM is hindering your productivity > >>>>> • Requirements for releasing software faster > >>>>> • Expert tips and advice for migrating your SCM now > >>>>> > http://p.sf.net/sfu/perforce_______________________________________________ > >>>>> sleuthkit-users mailing list > >>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>>> http://www.sleuthkit.org > >>>> > >>>> > >>>> > ------------------------------------------------------------------------------ > >>>> Is your legacy SCM system holding you back? Join Perforce May 7 to > find out: > >>>> • 3 signs your SCM is hindering your productivity > >>>> • Requirements for releasing software faster > >>>> • Expert tips and advice for migrating your SCM now > >>>> > http://p.sf.net/sfu/perforce_______________________________________________ > >>>> sleuthkit-users mailing list > >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>>> http://www.sleuthkit.org > >>> > >>> > >> > >> > ------------------------------------------------------------------------------ > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > >> Instantly run your Selenium tests across 300+ browser/OS combos. > >> Get unparalleled scalability from the best Selenium testing platform > available > >> Simple to use. Nothing to install. Get started now for free." > >> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > Instantly run your Selenium tests across 300+ browser/OS combos. > > Get unparalleled scalability from the best Selenium testing platform > available > > Simple to use. Nothing to install. Get started now for free." > > http://p.sf.net/sfu/SauceLabs > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > |
From: Brian C. <ca...@sl...> - 2014-05-14 19:24:25
|
Sure, you can pass in the path to the device for the local drive. Either "/dev/X" on unix-like systems or \\.\PhysicalDriveX on Windows systems. On May 14, 2014, at 3:07 PM, Mike Goldstein <do...@li...> wrote: > > Hi there, > > I am new to Sleuthkit and I have been doing research in how to use it with a C++ API. The documentation on http://fossies.org/dox/sleuthkit-4.1.3/ has been helpful. > But I have one question: The documentation indicates that one always needs to be analyzing an image (like a .iso file) of the drive. Is there any way that I can just insert a usb stick and analyze it as one of the files. > > Let me make myself clearer: > I find that I have to declare, > TskImgInfo *img_info = new TskImgInfo(); > > and then open the file as follows: > img_info->open("/home/Desktop/Image.iso", TSK_IMG_TYPE_DETECT, 0); > > Followed by another declaration: > TskFsInfo *fs_info = new TskFsInfo(); > > Followed by another open function: > (fs_info->open(img_info, 0, TSK_FS_TYPE_DETECT); > > So I want to know - is there a way I can just access the usb drive (for example) in the API using just the path (such as /dev/sdc) like I would in the command line? I mean, if I want to analyze a drive, do I have to make an ISO image of the file and then access it with the above code every time? > > I tried to ask this question before, but it seems like I wasn't so clear so nobody answered. > Thanks to anyone who responds. > Mike Goldstein > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2014-05-14 19:23:31
|
Hi Guthrie, As I just mentioned in the last e-mail on this topic, we weren't missing data. It's just that we had long cycle times to insert rows into the database because the cache (which is supposed to improve performance) had some bugs so the potential performance benefits were lost. thanks, brian On May 14, 2014, at 2:16 PM, Guthrie Quill <qu...@po...> wrote: > Brian, Jason et al, > > I have followed the threat about the unusually long process times ( > which I experienced myself). > Can you give some guidance on whether TSK/Autopsy is reliable for case > work? I am concerned about potentially missing large numbers of files. > I am a big advocate for FOSS in general ( and Autopsy in particular) > in my shop. I want to be able to defend its efficacy. > > Sent from my iPhone > >> On May 13, 2014, at 18:08, Brian Carrier <ca...@sl...> wrote: >> >> Hi Luis, >> >> Yea, A couple of my regression tests found some strange results with that fix. I pushed a new one up that is less strict on the filtering of the deleted files. There is still one issue I need to work out though. >> >> thanks, >> brian >> >> >> >>> On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>> >>> Forgot to mention, now loaddb is adding a different number of entries compared to previous version. Is it ok? >>> >>> Thank you >>> Nassif >>> >>> Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >>> Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images above and 1 hfs with 3 million entries. >>> >>> Also, images taking hours or days to finish now takes minutes with both patches! >>> >>> Thank you very much for addressing this issue. >>> Nassif >>> >>> Nope, this one from earlier today: >>> >>> https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 >>> >>> >>>> On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>> >>>> Hi Brian, >>>> >>>> Do you mean this fix? https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 >>>> >>>> It resolved a lot of the misses as i described before (05/02). But i am still getting thousands of misses. So I took a look at 2 generated sqlites and discovered that the remaining misses were from deleted folders. I think that deleted folders are not being stored into local cache... >>>> >>>> >>>> >>>> >>>> 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: >>>> Hi Luis, >>>> >>>> The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? >>>> >>>> thanks, >>>> brian >>>> >>>> >>>> >>>> >>>>> On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>> >>>>> Hi Brian and Simson, >>>>> >>>>> I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: >>>>> >>>>> ntfs image w/ 127.408 files 1 2 3 4 >>>>> no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s >>>>> disabling database parent_id look up 35s 11s 11s 12s >>>>> index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s >>>>> >>>>> >>>>> ntfs image w/ 216.693 files 1 2 3 4 >>>>> no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s >>>>> disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s >>>>> index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s >>>>> >>>>> >>>>> ntfs image w/ 433.321 files 1 2 3 4 >>>>> no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s >>>>> disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s >>>>> index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) >>>>> >>>>> So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. >>>>> >>>>> With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. >>>>> >>>>> Regards, >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: >>>>> Hi Nassif, >>>>> >>>>> As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. >>>>> >>>>> The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... >>>>> >>>>> brian >>>>> >>>>> >>>>> >>>>>> On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>> >>>>>> I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. >>>>>> >>>>>> Why add image process do not commit the data while it is being added to database? >>>>>> >>>>>> Nassif >>>>>> >>>>>> >>>>>> 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>> Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. >>>>>> >>>>>> Sorry for mistake, >>>>>> Nassif >>>>>> >>>>>> >>>>>> 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>> >>>>>> I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? >>>>>> >>>>>> Regards, >>>>>> Nassif >>>>>> >>>>>> >>>>>> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>> >>>>>> Ok, tested in 2 images. Fix resolved a lot of misses: >>>>>> >>>>>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses >>>>>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses >>>>>> >>>>>> I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. >>>>>> >>>>>> Thank you for taking a look so quickly. >>>>>> Nassif >>>>>> >>>>>> >>>>>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: >>>>>> >>>>>> Well that was an easy and embarrassing fix: >>>>>> >>>>>> if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { >>>>>> - seq = fs_file->name->meta_seq; >>>>>> + seq = fs_file->name->par_seq; >>>>>> } >>>>>> >>>>>> Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. >>>>>> >>>>>> thanks, >>>>>> brian >>>>>> >>>>>> >>>>>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: >>>>>>> >>>>>>> Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. >>>>>>> >>>>>>> brian >>>>>>> >>>>>>>> On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>>>> >>>>>>>> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. >>>>>>>> >>>>>>>> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. >>>>>>>> >>>>>>>> Other tests resulted in: >>>>>>>> 182.256 cache misses from 433.321 files (ntfs) >>>>>>>> 892.359 misses from 1.811.393 files (ntfs) >>>>>>>> 169.819 misses from 3.177.917 files (hfs) >>>>>>>> >>>>>>>> Luis Nassif >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>>>> Forgot to mention: we are using sleuthkit 4.1.3 >>>>>>>> >>>>>>>> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >>>>>>>> >>>>>>>> Hi Brian, >>>>>>>> >>>>>>>> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. >>>>>>>> >>>>>>>> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: >>>>>>>> Hi Luis, >>>>>>>> >>>>>>>> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. >>>>>>>> >>>>>>>> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. >>>>>>>> >>>>>>>> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: >>>>>>>> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. >>>>>>>> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? >>>>>>>> 3) Open the DB in a SQLite tool and do something like this: >>>>>>>> >>>>>>>> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE >>>>>>>> >>>>>>>> Is it in the DB? >>>>>>>> >>>>>>>> Thanks! >>>>>>>> >>>>>>>> brian >>>>>>>> >>>>>>>> >>>>>>>>> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. >>>>>>>>> >>>>>>>>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) >>>>>>>>> >>>>>>>>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: >>>>>>>>> >>>>>>>>> number of files / default load_db time / patched load_db time >>>>>>>>> ~80.000 / 20min / 2min >>>>>>>>> ~300.000 / 3h / 7min >>>>>>>>> ~700.000 / 48h / 27min >>>>>>>>> >>>>>>>>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>>>> sleuthkit-users mailing list >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>>>> http://www.sleuthkit.org >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>>> sleuthkit-users mailing list >>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>>> http://www.sleuthkit.org >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>> http://p.sf.net/sfu/SauceLabs >>>>>>> _______________________________________________ >>>>>>> sleuthkit-users mailing list >>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>> http://www.sleuthkit.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>>>> • 3 signs your SCM is hindering your productivity >>>>>> • Requirements for releasing software faster >>>>>> • Expert tips and advice for migrating your SCM now >>>>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>>>> sleuthkit-users mailing list >>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>> http://www.sleuthkit.org >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>>> • 3 signs your SCM is hindering your productivity >>>>> • Requirements for releasing software faster >>>>> • Expert tips and advice for migrating your SCM now >>>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>>> sleuthkit-users mailing list >>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>> http://www.sleuthkit.org >>> >>> ------------------------------------------------------------------------------ >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>> Get unparalleled scalability from the best Selenium testing platform available >>> Simple to use. Nothing to install. Get started now for free." >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>> sleuthkit-users mailing list >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>> http://www.sleuthkit.org >> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs >> _______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2014-05-14 19:20:14
|
Hi Luis, Pull the latest version of develop. I'm not seeing any cache misses on any images and the regression tests are all good. For reference on what has gone on in this thread: - NTFS has a sequence value that it increments each time that an MFT entry is re-used. - TSK (and other tools) used to ignore this value, but that changed about a year ago when a law enforcement person showed me some data where half of the tools showed an important file in one folder and another half showed it in a different folder. The difference was because some tools were ignoring the sequence and putting the file in a less than accurate place. - The tsk_loaddb tool (which is what Autopsy basically uses to make the SQLite Database) has a caching mechanism to find information about previously analyzed files so that it doesn't have to query the database each time, which is slow. - This exercise brought to light two bugs. One was that the wrong ID was being used in the cache lookup and it never hit. The other is even after that was fixed, some deleted files were still not being found in the cache because the wrong sequence value was being used. - In both cases, we weren't getting wrong results. We were simply being inefficient and that is why the ingest took longer. thanks, brian On May 13, 2014, at 6:08 PM, Brian Carrier <ca...@sl...> wrote: > Hi Luis, > > Yea, A couple of my regression tests found some strange results with that fix. I pushed a new one up that is less strict on the filtering of the deleted files. There is still one issue I need to work out though. > > thanks, > brian > > > > On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > >> Forgot to mention, now loaddb is adding a different number of entries compared to previous version. Is it ok? >> >> Thank you >> Nassif >> >> Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >> Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images above and 1 hfs with 3 million entries. >> >> Also, images taking hours or days to finish now takes minutes with both patches! >> >> Thank you very much for addressing this issue. >> Nassif >> >> Nope, this one from earlier today: >> >> https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 >> >> >> On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >> >>> Hi Brian, >>> >>> Do you mean this fix? https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 >>> >>> It resolved a lot of the misses as i described before (05/02). But i am still getting thousands of misses. So I took a look at 2 generated sqlites and discovered that the remaining misses were from deleted folders. I think that deleted folders are not being stored into local cache... >>> >>> >>> >>> >>> 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: >>> Hi Luis, >>> >>> The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? >>> >>> thanks, >>> brian >>> >>> >>> >>> >>> On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>> >>>> Hi Brian and Simson, >>>> >>>> I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: >>>> >>>> ntfs image w/ 127.408 files 1 2 3 4 >>>> no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s >>>> disabling database parent_id look up 35s 11s 11s 12s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s >>>> >>>> >>>> ntfs image w/ 216.693 files 1 2 3 4 >>>> no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s >>>> disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s >>>> >>>> >>>> ntfs image w/ 433.321 files 1 2 3 4 >>>> no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s >>>> disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) >>>> >>>> So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. >>>> >>>> With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. >>>> >>>> Regards, >>>> Nassif >>>> >>>> >>>> 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: >>>> Hi Nassif, >>>> >>>> As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. >>>> >>>> The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... >>>> >>>> brian >>>> >>>> >>>> >>>> On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>> >>>>> I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. >>>>> >>>>> Why add image process do not commit the data while it is being added to database? >>>>> >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. >>>>> >>>>> Sorry for mistake, >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> >>>>> I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? >>>>> >>>>> Regards, >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> >>>>> Ok, tested in 2 images. Fix resolved a lot of misses: >>>>> >>>>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses >>>>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses >>>>> >>>>> I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. >>>>> >>>>> Thank you for taking a look so quickly. >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: >>>>> >>>>> Well that was an easy and embarrassing fix: >>>>> >>>>> if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { >>>>> - seq = fs_file->name->meta_seq; >>>>> + seq = fs_file->name->par_seq; >>>>> } >>>>> >>>>> Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. >>>>> >>>>> thanks, >>>>> brian >>>>> >>>>> >>>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: >>>>> >>>>>> Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. >>>>>> >>>>>> brian >>>>>> >>>>>> On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>> >>>>>>> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. >>>>>>> >>>>>>> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. >>>>>>> >>>>>>> Other tests resulted in: >>>>>>> 182.256 cache misses from 433.321 files (ntfs) >>>>>>> 892.359 misses from 1.811.393 files (ntfs) >>>>>>> 169.819 misses from 3.177.917 files (hfs) >>>>>>> >>>>>>> Luis Nassif >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>>> Forgot to mention: we are using sleuthkit 4.1.3 >>>>>>> >>>>>>> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >>>>>>> >>>>>>> Hi Brian, >>>>>>> >>>>>>> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. >>>>>>> >>>>>>> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: >>>>>>> Hi Luis, >>>>>>> >>>>>>> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. >>>>>>> >>>>>>> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. >>>>>>> >>>>>>> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: >>>>>>> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. >>>>>>> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? >>>>>>> 3) Open the DB in a SQLite tool and do something like this: >>>>>>> >>>>>>> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE >>>>>>> >>>>>>> Is it in the DB? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> brian >>>>>>> >>>>>>> >>>>>>> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. >>>>>>>> >>>>>>>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) >>>>>>>> >>>>>>>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: >>>>>>>> >>>>>>>> number of files / default load_db time / patched load_db time >>>>>>>> ~80.000 / 20min / 2min >>>>>>>> ~300.000 / 3h / 7min >>>>>>>> ~700.000 / 48h / 27min >>>>>>>> >>>>>>>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>>> sleuthkit-users mailing list >>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>>> http://www.sleuthkit.org >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>> sleuthkit-users mailing list >>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>> http://www.sleuthkit.org >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>> http://p.sf.net/sfu/SauceLabs >>>>>> _______________________________________________ >>>>>> sleuthkit-users mailing list >>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>> http://www.sleuthkit.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>>> • 3 signs your SCM is hindering your productivity >>>>> • Requirements for releasing software faster >>>>> • Expert tips and advice for migrating your SCM now >>>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>>> sleuthkit-users mailing list >>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>> http://www.sleuthkit.org >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>> • 3 signs your SCM is hindering your productivity >>>> • Requirements for releasing software faster >>>> • Expert tips and advice for migrating your SCM now >>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>> sleuthkit-users mailing list >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> http://www.sleuthkit.org >>> >>> >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs_______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Mike G. <do...@li...> - 2014-05-14 19:07:36
|
Hi there, I am new to Sleuthkit and I have been doing research in how to use it with a C++ API. The documentation on http://fossies.org/dox/sleuthkit-4.1.3/ has been helpful.But I have one question: The documentation indicates that one always needs to be analyzing an image (like a .iso file) of the drive. Is there any way that I can just insert a usb stick and analyze it as one of the files. Let me make myself clearer:I find that I have to declare, TskImgInfo *img_info = new TskImgInfo(); and then open the file as follows:img_info->open("/home/Desktop/Image.iso", TSK_IMG_TYPE_DETECT, 0); Followed by another declaration:TskFsInfo *fs_info = new TskFsInfo(); Followed by another open function:(fs_info->open(img_info, 0, TSK_FS_TYPE_DETECT); So I want to know - is there a way I can just access the usb drive (for example) in the API using just the path (such as /dev/sdc) like I would in the command line? I mean, if I want to analyze a drive, do I have to make an ISO image of the file and then access it with the above code every time? I tried to ask this question before, but it seems like I wasn't so clear so nobody answered.☹Thanks to anyone who responds. Mike Goldstein |
From: Guthrie Q. <qu...@po...> - 2014-05-14 18:43:30
|
Brian, Jason et al, I have followed the threat about the unusually long process times ( which I experienced myself). Can you give some guidance on whether TSK/Autopsy is reliable for case work? I am concerned about potentially missing large numbers of files. I am a big advocate for FOSS in general ( and Autopsy in particular) in my shop. I want to be able to defend its efficacy. Sent from my iPhone > On May 13, 2014, at 18:08, Brian Carrier <ca...@sl...> wrote: > > Hi Luis, > > Yea, A couple of my regression tests found some strange results with that fix. I pushed a new one up that is less strict on the filtering of the deleted files. There is still one issue I need to work out though. > > thanks, > brian > > > >> On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >> >> Forgot to mention, now loaddb is adding a different number of entries compared to previous version. Is it ok? >> >> Thank you >> Nassif >> >> Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >> Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images above and 1 hfs with 3 million entries. >> >> Also, images taking hours or days to finish now takes minutes with both patches! >> >> Thank you very much for addressing this issue. >> Nassif >> >> Nope, this one from earlier today: >> >> https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 >> >> >>> On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>> >>> Hi Brian, >>> >>> Do you mean this fix? https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 >>> >>> It resolved a lot of the misses as i described before (05/02). But i am still getting thousands of misses. So I took a look at 2 generated sqlites and discovered that the remaining misses were from deleted folders. I think that deleted folders are not being stored into local cache... >>> >>> >>> >>> >>> 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: >>> Hi Luis, >>> >>> The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? >>> >>> thanks, >>> brian >>> >>> >>> >>> >>>> On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>> >>>> Hi Brian and Simson, >>>> >>>> I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: >>>> >>>> ntfs image w/ 127.408 files 1 2 3 4 >>>> no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s >>>> disabling database parent_id look up 35s 11s 11s 12s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s >>>> >>>> >>>> ntfs image w/ 216.693 files 1 2 3 4 >>>> no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s >>>> disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s >>>> >>>> >>>> ntfs image w/ 433.321 files 1 2 3 4 >>>> no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s >>>> disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s >>>> index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) >>>> >>>> So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. >>>> >>>> With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. >>>> >>>> Regards, >>>> Nassif >>>> >>>> >>>> 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: >>>> Hi Nassif, >>>> >>>> As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. >>>> >>>> The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... >>>> >>>> brian >>>> >>>> >>>> >>>>> On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>> >>>>> I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. >>>>> >>>>> Why add image process do not commit the data while it is being added to database? >>>>> >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. >>>>> >>>>> Sorry for mistake, >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> >>>>> I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? >>>>> >>>>> Regards, >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>> >>>>> Ok, tested in 2 images. Fix resolved a lot of misses: >>>>> >>>>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses >>>>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses >>>>> >>>>> I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. >>>>> >>>>> Thank you for taking a look so quickly. >>>>> Nassif >>>>> >>>>> >>>>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: >>>>> >>>>> Well that was an easy and embarrassing fix: >>>>> >>>>> if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { >>>>> - seq = fs_file->name->meta_seq; >>>>> + seq = fs_file->name->par_seq; >>>>> } >>>>> >>>>> Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. >>>>> >>>>> thanks, >>>>> brian >>>>> >>>>> >>>>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: >>>>>> >>>>>> Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. >>>>>> >>>>>> brian >>>>>> >>>>>>> On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>>> >>>>>>> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. >>>>>>> >>>>>>> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. >>>>>>> >>>>>>> Other tests resulted in: >>>>>>> 182.256 cache misses from 433.321 files (ntfs) >>>>>>> 892.359 misses from 1.811.393 files (ntfs) >>>>>>> 169.819 misses from 3.177.917 files (hfs) >>>>>>> >>>>>>> Luis Nassif >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>>>>> Forgot to mention: we are using sleuthkit 4.1.3 >>>>>>> >>>>>>> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: >>>>>>> >>>>>>> Hi Brian, >>>>>>> >>>>>>> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. >>>>>>> >>>>>>> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: >>>>>>> Hi Luis, >>>>>>> >>>>>>> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. >>>>>>> >>>>>>> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. >>>>>>> >>>>>>> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: >>>>>>> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. >>>>>>> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? >>>>>>> 3) Open the DB in a SQLite tool and do something like this: >>>>>>> >>>>>>> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE >>>>>>> >>>>>>> Is it in the DB? >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> brian >>>>>>> >>>>>>> >>>>>>>> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. >>>>>>>> >>>>>>>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) >>>>>>>> >>>>>>>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: >>>>>>>> >>>>>>>> number of files / default load_db time / patched load_db time >>>>>>>> ~80.000 / 20min / 2min >>>>>>>> ~300.000 / 3h / 7min >>>>>>>> ~700.000 / 48h / 27min >>>>>>>> >>>>>>>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>>> sleuthkit-users mailing list >>>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>>> http://www.sleuthkit.org >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------------ >>>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>>>>> sleuthkit-users mailing list >>>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>>> http://www.sleuthkit.org >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>>>> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>>>> unparalleled scalability from the best Selenium testing platform available. >>>>>> Simple to use. Nothing to install. Get started now for free." >>>>>> http://p.sf.net/sfu/SauceLabs >>>>>> _______________________________________________ >>>>>> sleuthkit-users mailing list >>>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>>> http://www.sleuthkit.org >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>>> • 3 signs your SCM is hindering your productivity >>>>> • Requirements for releasing software faster >>>>> • Expert tips and advice for migrating your SCM now >>>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>>> sleuthkit-users mailing list >>>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>>> http://www.sleuthkit.org >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Is your legacy SCM system holding you back? Join Perforce May 7 to find out: >>>> • 3 signs your SCM is hindering your productivity >>>> • Requirements for releasing software faster >>>> • Expert tips and advice for migrating your SCM now >>>> http://p.sf.net/sfu/perforce_______________________________________________ >>>> sleuthkit-users mailing list >>>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> http://www.sleuthkit.org >> >> ------------------------------------------------------------------------------ >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >> Instantly run your Selenium tests across 300+ browser/OS combos. >> Get unparalleled scalability from the best Selenium testing platform available >> Simple to use. Nothing to install. Get started now for free." >> http://p.sf.net/sfu/SauceLabs_______________________________________________ >> sleuthkit-users mailing list >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >> http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brian C. <ca...@sl...> - 2014-05-13 22:08:46
|
Hi Luis, Yea, A couple of my regression tests found some strange results with that fix. I pushed a new one up that is less strict on the filtering of the deleted files. There is still one issue I need to work out though. thanks, brian On May 13, 2014, at 5:37 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > Forgot to mention, now loaddb is adding a different number of entries compared to previous version. Is it ok? > > Thank you > Nassif > > Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images above and 1 hfs with 3 million entries. > > Also, images taking hours or days to finish now takes minutes with both patches! > > Thank you very much for addressing this issue. > Nassif > > Nope, this one from earlier today: > > https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 > > > On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > Hi Brian, > > > > Do you mean this fix? https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 > > > > It resolved a lot of the misses as i described before (05/02). But i am still getting thousands of misses. So I took a look at 2 generated sqlites and discovered that the remaining misses were from deleted folders. I think that deleted folders are not being stored into local cache... > > > > > > > > > > 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: > > Hi Luis, > > > > The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? > > > > thanks, > > brian > > > > > > > > > > On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > Hi Brian and Simson, > > > > > > I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: > > > > > > ntfs image w/ 127.408 files 1 2 3 4 > > > no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s > > > disabling database parent_id look up 35s 11s 11s 12s > > > index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s > > > > > > > > > ntfs image w/ 216.693 files 1 2 3 4 > > > no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s > > > disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s > > > index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s > > > > > > > > > ntfs image w/ 433.321 files 1 2 3 4 > > > no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s > > > disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s > > > index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) > > > > > > So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. > > > > > > With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. > > > > > > Regards, > > > Nassif > > > > > > > > > 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > > > Hi Nassif, > > > > > > As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. > > > > > > The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... > > > > > > brian > > > > > > > > > > > > On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > > > I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. > > > > > > > > Why add image process do not commit the data while it is being added to database? > > > > > > > > Nassif > > > > > > > > > > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. > > > > > > > > Sorry for mistake, > > > > Nassif > > > > > > > > > > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? > > > > > > > > Regards, > > > > Nassif > > > > > > > > > > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > > > Ok, tested in 2 images. Fix resolved a lot of misses: > > > > > > > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > > > > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > > > > > > > I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. > > > > > > > > Thank you for taking a look so quickly. > > > > Nassif > > > > > > > > > > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > > > > > > > Well that was an easy and embarrassing fix: > > > > > > > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > > > > - seq = fs_file->name->meta_seq; > > > > + seq = fs_file->name->par_seq; > > > > } > > > > > > > > Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. > > > > > > > > thanks, > > > > brian > > > > > > > > > > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: > > > > > > > > > Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. > > > > > > > > > > brian > > > > > > > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > > > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. > > > > >> > > > > >> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. > > > > >> > > > > >> Other tests resulted in: > > > > >> 182.256 cache misses from 433.321 files (ntfs) > > > > >> 892.359 misses from 1.811.393 files (ntfs) > > > > >> 169.819 misses from 3.177.917 files (hfs) > > > > >> > > > > >> Luis Nassif > > > > >> > > > > >> > > > > >> > > > > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > >> Forgot to mention: we are using sleuthkit 4.1.3 > > > > >> > > > > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > > > > >> > > > > >> Hi Brian, > > > > >> > > > > >> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. > > > > >> > > > > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > > > > >> Hi Luis, > > > > >> > > > > >> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. > > > > >> > > > > >> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. > > > > >> > > > > >> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: > > > > >> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. > > > > >> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? > > > > >> 3) Open the DB in a SQLite tool and do something like this: > > > > >> > > > > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > > > >> > > > > >> Is it in the DB? > > > > >> > > > > >> Thanks! > > > > >> > > > > >> brian > > > > >> > > > > >> > > > > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > > > > >>> > > > > >>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > > > > >>> > > > > >>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > > > > >>> > > > > >>> number of files / default load_db time / patched load_db time > > > > >>> ~80.000 / 20min / 2min > > > > >>> ~300.000 / 3h / 7min > > > > >>> ~700.000 / 48h / 27min > > > > >>> > > > > >>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > > > > >>> ------------------------------------------------------------------------------ > > > > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > > >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > > >>> unparalleled scalability from the best Selenium testing platform available. > > > > >>> Simple to use. Nothing to install. Get started now for free." > > > > >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > > >>> sleuthkit-users mailing list > > > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > >>> http://www.sleuthkit.org > > > > >> > > > > >> > > > > >> ------------------------------------------------------------------------------ > > > > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > > >> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > > >> unparalleled scalability from the best Selenium testing platform available. > > > > >> Simple to use. Nothing to install. Get started now for free." > > > > >> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > > >> sleuthkit-users mailing list > > > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > >> http://www.sleuthkit.org > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > > > unparalleled scalability from the best Selenium testing platform available. > > > > > Simple to use. Nothing to install. Get started now for free." > > > > > http://p.sf.net/sfu/SauceLabs > > > > > _______________________________________________ > > > > > sleuthkit-users mailing list > > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > > http://www.sleuthkit.org > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > > > > • 3 signs your SCM is hindering your productivity > > > > • Requirements for releasing software faster > > > > • Expert tips and advice for migrating your SCM now > > > > http://p.sf.net/sfu/perforce_______________________________________________ > > > > sleuthkit-users mailing list > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > http://www.sleuthkit.org > > > > > > > > > ------------------------------------------------------------------------------ > > > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > > > • 3 signs your SCM is hindering your productivity > > > • Requirements for releasing software faster > > > • Expert tips and advice for migrating your SCM now > > > http://p.sf.net/sfu/perforce_______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Luís F. N. <lfc...@gm...> - 2014-05-13 21:37:13
|
Forgot to mention, now loaddb is adding a different number of entries compared to previous version. Is it ok? Thank you Nassif Em 13/05/2014 18:19, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images > above and 1 hfs with 3 million entries. > > Also, images taking hours or days to finish now takes minutes with both > patches! > > Thank you very much for addressing this issue. > Nassif > Nope, this one from earlier today: > > > https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 > > > On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > Hi Brian, > > > > Do you mean this fix? > https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 > > > > It resolved a lot of the misses as i described before (05/02). But i am > still getting thousands of misses. So I took a look at 2 generated sqlites > and discovered that the remaining misses were from deleted folders. I think > that deleted folders are not being stored into local cache... > > > > > > > > > > 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: > > Hi Luis, > > > > The develop branch on github has a fix that removed the remaining misses > on my test image, but I had far fewer than you did. Can you pull it and > try that one? > > > > thanks, > > brian > > > > > > > > > > On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > > > Hi Brian and Simson, > > > > > > I have done a number of tests since yesterday. I have not restarted > the computer between the tests, because i think it would be better to use > the OS IO cache to focus on CPU processing time, without io interference, > except for the first run. I used another computer, faster than before. > Results below: > > > > > > ntfs image w/ 127.408 files 1 2 3 4 > > > no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s > > > disabling database parent_id look up 35s 11s 11s 12s > > > index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s > 3min 48s 3min 48s 3min 47s > > > > > > > > > ntfs image w/ 216.693 files 1 2 3 4 > > > no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s > 5min 2s > > > disabling database parent_id look up 2min 8s 1min 21s 1min 21s > 1min 21s > > > index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s > 5min 46s 5min 43s 5min 43s > > > > > > > > > ntfs image w/ 433.321 files 1 2 3 4 > > > no patch (only Brian's fix) 21min 38s 21min 40s 21min > 10s 21min 10s > > > disabling database parent_id look up 3min 59s 2min 47s > 2min 47s 2min 47s > > > index on meta_addr,fs_obj_id and commit for each 5000 files (not run > based on previous results) > > > > > > So, Brian was right, the commits increased processing time. And as you > can see, it would be great if it is possible to eliminate the misses and > remove the database parent_id look up. > > > > > > With that in mind, I took a look at one of the sqlites, and I think I > discovered the cause of a lot of the misses (maybe all of them). The misses > happened with deleted files inside deleted folders. I think deleted folders > can not being stored into local cache. > > > > > > Regards, > > > Nassif > > > > > > > > > 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > > > Hi Nassif, > > > > > > As Simson mentioned, the current setup was intended to be the fastest. > Doing frequent commits takes longer and more indexes makes commits take > longer. This is the only process that we know about that does this type of > lookup and would use those indexes. > > > > > > The bigger question for me is why we are getting these cache misses > and I need to spend some more time with some more images to find out. The > lookup is to find the ID of the parent and we process from the root > directory down. So, in theory, we have already processed the parent folder > before the children and it should be in the cache. We need to figure out > why the parent isn't in the cache... > > > > > > brian > > > > > > > > > > > > On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > > > > > I have done one last test, because it was very strange to me that > indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We > suspected that the indexes were not being used by sqlite, maybe because the > whole data is not commited before add image process finishes (i am not a > sqlite expert, is it possible?). So we inserted a commit for each 5.000 > files added to database. The add image process time decreased from 1hour to > 30min, so we think that the indexes were not being used. > > > > > > > > Why add image process do not commit the data while it is being added > to database? > > > > > > > > Nassif > > > > > > > > > > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > Fixing my last email, the test was run with the indexes AND Brian's > fix. Then I removed the index patch and loadDb took the same 1 hour to > finish with only Brian's fix. So the index patch did not help improving > database look up for parent_id. > > > > > > > > Sorry for mistake, > > > > Nassif > > > > > > > > > > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > > > I tested loadDb with a create index on meta_addr and fs_obj_id > patch. The image with 433.321 files, previously taking 2h45min to load, now > takes 1h to finish loadDb with the indexes. That is a good speed up, but > completely disabling the database parent_id look up, it only takes 7min to > finish. Is there another thing we can do to improve the parent_id database > look up? > > > > > > > > Regards, > > > > Nassif > > > > > > > > > > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > > > Ok, tested in 2 images. Fix resolved a lot of misses: > > > > > > > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > > > > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > > > > > > > I also think creating an index on tsk_files(meta_addr) and > tsk_files(fs_obj_id) could help improving the database look up for those > deleted files not found in local cache, what do you think? The database > look up seems too slow, as described in my first email. > > > > > > > > Thank you for taking a look so quickly. > > > > Nassif > > > > > > > > > > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > > > > > > > Well that was an easy and embarrassing fix: > > > > > > > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > > > > - seq = fs_file->name->meta_seq; > > > > + seq = fs_file->name->par_seq; > > > > } > > > > > > > > Turns out we've been having a lot of cache misses because of this > stupid bug. Can you replace that line and see if it helps. It certainly > did on my test image. > > > > > > > > thanks, > > > > brian > > > > > > > > > > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> > wrote: > > > > > > > > > Thanks for the tests. I wonder if it has to do with an incorrect > sequence number. NTFS increments the sequence number each time a file is > re-allocated. Deleted orphan files could be getting misses. I'll add some > logging on my system and see what kind of misses I get. > > > > > > > > > > brian > > > > > > > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif < > lfc...@gm...> wrote: > > > > > > > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an > ide, so did not use breakpoints. Instead, I changed > TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not > found and return 1 when it is found in the cache map. > > > > >> > > > > >> Performing queries on the generated sqlite, there were 19.558 > cache misses from an image with 3 ntfs partitions and 127.408 files. I > confirmed that many parent_meta_addr missed from cache (now stored in > tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths > corresponding to these meta_addr are parents of those files whose > processing have not found them in cache. > > > > >> > > > > >> Other tests resulted in: > > > > >> 182.256 cache misses from 433.321 files (ntfs) > > > > >> 892.359 misses from 1.811.393 files (ntfs) > > > > >> 169.819 misses from 3.177.917 files (hfs) > > > > >> > > > > >> Luis Nassif > > > > >> > > > > >> > > > > >> > > > > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif < > lfc...@gm...>: > > > > >> Forgot to mention: we are using sleuthkit 4.1.3 > > > > >> > > > > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> > escreveu: > > > > >> > > > > >> Hi Brian, > > > > >> > > > > >> The 3 cases above were ntfs. I also tested with hfs and canceled > loaddb after 1 day. The modified version finished after 8hours and added > about 3 million entries. We will try to do the tests you have suggested. > > > > >> > > > > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> > escreveu: > > > > >> Hi Luis, > > > > >> > > > > >> What kind of file system was it? I fixed a bug a little while ago > in that code for HFS file systems that resulted in a lot of cache misses. > > > > >> > > > > >> In theory, everything should be cached. It sounds like a bug if > you are getting so many misses. The basic idea of this code is that > everything in the DB gets assigned a unique object ID and we make > associations between files and their parent folder's unique ID. > > > > >> > > > > >> Since you seem to be comfortable with a debugger in the code, can > you set a breakpoint for when the miss happens and: > > > > >> 1) Determine the path of the file that was being added to the DB > and the parent address that was trying to be found. > > > > >> 2) Use the 'ffind' TSK tool to then map that parent address to a > path. Is it a subset of the path from #1? > > > > >> 3) Open the DB in a SQLite tool and do something like this: > > > > >> > > > > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > > > >> > > > > >> Is it in the DB? > > > > >> > > > > >> Thanks! > > > > >> > > > > >> brian > > > > >> > > > > >> > > > > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif < > lfc...@gm...> wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> We have investigated a bit why the add image process is too slow > in some cases. The add image process time seems to be quadratic with the > number of files in the image. > > > > >>> > > > > >>> We detected that the function TskDbSqlite::findParObjId(), in > db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id > mapping in the local cache for a lot of files, causing it to search for the > mapping in the database (not sure if it is an non-indexed search?) > > > > >>> > > > > >>> For testing purposes, we added a "return 1;" line right after > the cache look up, disabling the database look up, and this resulted in > great speed ups: > > > > >>> > > > > >>> number of files / default load_db time / patched load_db time > > > > >>> ~80.000 / 20min / 2min > > > > >>> ~300.000 / 3h / 7min > > > > >>> ~700.000 / 48h / 27min > > > > >>> > > > > >>> We wonder if it is possible to store all par_meta_addr -> par_id > mappings into local cache (better) or doing an improved (indexed?) search > for the mapping in the database. We think that someone with more knowledge > of load_db code could help a lot here. > > > > >>> > ------------------------------------------------------------------------------ > > > > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - > For FREE > > > > >>> Instantly run your Selenium tests across 300+ browser/OS combos. > Get > > > > >>> unparalleled scalability from the best Selenium testing platform > available. > > > > >>> Simple to use. Nothing to install. Get started now for free." > > > > >>> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > > >>> sleuthkit-users mailing list > > > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > >>> http://www.sleuthkit.org > > > > >> > > > > >> > > > > >> > ------------------------------------------------------------------------------ > > > > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > > > > >> Instantly run your Selenium tests across 300+ browser/OS combos. > Get > > > > >> unparalleled scalability from the best Selenium testing platform > available. > > > > >> Simple to use. Nothing to install. Get started now for free." > > > > >> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > > >> sleuthkit-users mailing list > > > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > >> http://www.sleuthkit.org > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > > > > > Instantly run your Selenium tests across 300+ browser/OS combos. > Get > > > > > unparalleled scalability from the best Selenium testing platform > available. > > > > > Simple to use. Nothing to install. Get started now for free." > > > > > http://p.sf.net/sfu/SauceLabs > > > > > _______________________________________________ > > > > > sleuthkit-users mailing list > > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > > http://www.sleuthkit.org > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > Is your legacy SCM system holding you back? Join Perforce May 7 to > find out: > > > > • 3 signs your SCM is hindering your productivity > > > > • Requirements for releasing software faster > > > > • Expert tips and advice for migrating your SCM now > > > > > http://p.sf.net/sfu/perforce_______________________________________________ > > > > sleuthkit-users mailing list > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > http://www.sleuthkit.org > > > > > > > > > > ------------------------------------------------------------------------------ > > > Is your legacy SCM system holding you back? Join Perforce May 7 to > find out: > > > • 3 signs your SCM is hindering your productivity > > > • Requirements for releasing software faster > > > • Expert tips and advice for migrating your SCM now > > > > http://p.sf.net/sfu/perforce_______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > |
From: Luís F. N. <lfc...@gm...> - 2014-05-13 21:19:09
|
Excelent, Brian! All misses were resolved. Tested in the 3 ntfs images above and 1 hfs with 3 million entries. Also, images taking hours or days to finish now takes minutes with both patches! Thank you very much for addressing this issue. Nassif Nope, this one from earlier today: https://github.com/sleuthkit/sleuthkit/commit/f0672805c18a634ffffeff1f39f793501ddb7702 On May 12, 2014, at 3:23 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > Hi Brian, > > Do you mean this fix? https://github.com/sleuthkit/sleuthkit/commit/7b257b6c8252f9e9a7202990710e3a0ef31bf6b7 > > It resolved a lot of the misses as i described before (05/02). But i am still getting thousands of misses. So I took a look at 2 generated sqlites and discovered that the remaining misses were from deleted folders. I think that deleted folders are not being stored into local cache... > > > > > 2014-05-12 13:55 GMT-03:00 Brian Carrier <ca...@sl...>: > Hi Luis, > > The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? > > thanks, > brian > > > > > On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > Hi Brian and Simson, > > > > I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: > > > > ntfs image w/ 127.408 files 1 2 3 4 > > no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s > > disabling database parent_id look up 35s 11s 11s 12s > > index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s > > > > > > ntfs image w/ 216.693 files 1 2 3 4 > > no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s > > disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s > > index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s > > > > > > ntfs image w/ 433.321 files 1 2 3 4 > > no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s > > disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s > > index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) > > > > So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. > > > > With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. > > > > Regards, > > Nassif > > > > > > 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > > Hi Nassif, > > > > As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. > > > > The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... > > > > brian > > > > > > > > On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. > > > > > > Why add image process do not commit the data while it is being added to database? > > > > > > Nassif > > > > > > > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. > > > > > > Sorry for mistake, > > > Nassif > > > > > > > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? > > > > > > Regards, > > > Nassif > > > > > > > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > > > Ok, tested in 2 images. Fix resolved a lot of misses: > > > > > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > > > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > > > > > I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. > > > > > > Thank you for taking a look so quickly. > > > Nassif > > > > > > > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > > > > > Well that was an easy and embarrassing fix: > > > > > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > > > - seq = fs_file->name->meta_seq; > > > + seq = fs_file->name->par_seq; > > > } > > > > > > Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. > > > > > > thanks, > > > brian > > > > > > > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: > > > > > > > Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. > > > > > > > > brian > > > > > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. > > > >> > > > >> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. > > > >> > > > >> Other tests resulted in: > > > >> 182.256 cache misses from 433.321 files (ntfs) > > > >> 892.359 misses from 1.811.393 files (ntfs) > > > >> 169.819 misses from 3.177.917 files (hfs) > > > >> > > > >> Luis Nassif > > > >> > > > >> > > > >> > > > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm... >: > > > >> Forgot to mention: we are using sleuthkit 4.1.3 > > > >> > > > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > > > >> > > > >> Hi Brian, > > > >> > > > >> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. > > > >> > > > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > > > >> Hi Luis, > > > >> > > > >> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. > > > >> > > > >> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. > > > >> > > > >> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: > > > >> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. > > > >> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? > > > >> 3) Open the DB in a SQLite tool and do something like this: > > > >> > > > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > > >> > > > >> Is it in the DB? > > > >> > > > >> Thanks! > > > >> > > > >> brian > > > >> > > > >> > > > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif < lfc...@gm...> wrote: > > > >> > > > >>> Hi, > > > >>> > > > >>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > > > >>> > > > >>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > > > >>> > > > >>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > > > >>> > > > >>> number of files / default load_db time / patched load_db time > > > >>> ~80.000 / 20min / 2min > > > >>> ~300.000 / 3h / 7min > > > >>> ~700.000 / 48h / 27min > > > >>> > > > >>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > > > >>> ------------------------------------------------------------------------------ > > > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > >>> unparalleled scalability from the best Selenium testing platform available. > > > >>> Simple to use. Nothing to install. Get started now for free." > > > >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > >>> sleuthkit-users mailing list > > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > >>> http://www.sleuthkit.org > > > >> > > > >> > > > >> ------------------------------------------------------------------------------ > > > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > >> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > >> unparalleled scalability from the best Selenium testing platform available. > > > >> Simple to use. Nothing to install. Get started now for free." > > > >> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > > >> sleuthkit-users mailing list > > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > >> http://www.sleuthkit.org > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > > unparalleled scalability from the best Selenium testing platform available. > > > > Simple to use. Nothing to install. Get started now for free." > > > > http://p.sf.net/sfu/SauceLabs > > > > _______________________________________________ > > > > sleuthkit-users mailing list > > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > > http://www.sleuthkit.org > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > > > • 3 signs your SCM is hindering your productivity > > > • Requirements for releasing software faster > > > • Expert tips and advice for migrating your SCM now > > > http://p.sf.net/sfu/perforce_______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > ------------------------------------------------------------------------------ > > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > > • 3 signs your SCM is hindering your productivity > > • Requirements for releasing software faster > > • Expert tips and advice for migrating your SCM now > > http://p.sf.net/sfu/perforce_______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > |
From: Mike G. <do...@li...> - 2014-05-13 21:15:50
|
Hi there, I am very new to TSK and I have been trying to teach myself how to program in C++ using the TSK library. I have found https://digital-forensics.sans.org/community/papers/gcfa/forensic-investigation-usb-flashdrive-image-cc-terminals_188 to be helpful. However, I am looking for more examples of this code. Can anyone direct me to something similar. Additionally, I have been trying my own hand at it. I created the following small program. #include <iostream>#include <string>#include <stdio.h>#include <stdlib.h>#include </usr/include/tsk/libtsk.h> using namespace std;int main(int argc, char **argv){ TskImgInfo *img_info = new TskImgInfo(); TSK_TCHAR **temp = (TSK_TCHAR **) argv; printf("Opening directory %s \n", temp[1]); if(img_info->open(argv[1], TSK_IMG_TYPE_DETECT, 0) == 0) { printf("Directory opened successfully\n"); } else { printf("Error opening directory %s \n", temp[1]); exit(1); } return 0;} But I'm not sure what I'm even doing: What is TskImgInfo? Is that a disk image? Also, why does it only work on specific files and not on directories?And if I want to work with /dev/sdc for example, what must I replace TskImgInfo with? Thanks in advance, Mike Goldstein |
From: Brian C. <ca...@sl...> - 2014-05-13 14:06:22
|
Basis Technology is hosting another Autopsy module writing competition around OSDFCon. Submissions can be entirely new analysis techniques or wrappers around existing tools. A primary goal of Autopsy 3 was to be a digital forensics platform so that users could use a single tool to perform their investigations and not waste time moving data around between various stand-alone tools. We've built the infrastructure and now we need developers to write modules so that we can all achieve the original goal from OSDFCon 2010. Last year, the winners developed registry parsing and fuzzy hashing modules. This year, we're looking for even better submissions. OSDFCon attendees will vote on who gets the cash prizes. Submissions are due Oct 20, 2014. Rules are available on the website: http://www.basistech.com/osdfcon-contest/ If you are looking for ideas, the above site has a link to a set of feature requests that have been submitted. Note that we also sponsored a student-based competition this year too. This competition is different. It has bigger prizes and is timed based on OSDFCon and not semesters (http://www.basistech.com/digital-forensics/autopsy/autopsy-for-educators/student-development-contest/). thanks, brian |
From: Brian C. <ca...@sl...> - 2014-05-12 16:55:18
|
Hi Luis, The develop branch on github has a fix that removed the remaining misses on my test image, but I had far fewer than you did. Can you pull it and try that one? thanks, brian On May 8, 2014, at 9:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > Hi Brian and Simson, > > I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: > > ntfs image w/ 127.408 files 1 2 3 4 > no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s > disabling database parent_id look up 35s 11s 11s 12s > index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s > > > ntfs image w/ 216.693 files 1 2 3 4 > no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s > disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s > index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s > > > ntfs image w/ 433.321 files 1 2 3 4 > no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s > disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s > index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) > > So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. > > With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. > > Regards, > Nassif > > > 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > Hi Nassif, > > As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. > > The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... > > brian > > > > On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. > > > > Why add image process do not commit the data while it is being added to database? > > > > Nassif > > > > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. > > > > Sorry for mistake, > > Nassif > > > > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? > > > > Regards, > > Nassif > > > > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > Ok, tested in 2 images. Fix resolved a lot of misses: > > > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > > > I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. > > > > Thank you for taking a look so quickly. > > Nassif > > > > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > > > Well that was an easy and embarrassing fix: > > > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > > - seq = fs_file->name->meta_seq; > > + seq = fs_file->name->par_seq; > > } > > > > Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. > > > > thanks, > > brian > > > > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: > > > > > Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. > > > > > > brian > > > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. > > >> > > >> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. > > >> > > >> Other tests resulted in: > > >> 182.256 cache misses from 433.321 files (ntfs) > > >> 892.359 misses from 1.811.393 files (ntfs) > > >> 169.819 misses from 3.177.917 files (hfs) > > >> > > >> Luis Nassif > > >> > > >> > > >> > > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > >> Forgot to mention: we are using sleuthkit 4.1.3 > > >> > > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > > >> > > >> Hi Brian, > > >> > > >> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. > > >> > > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > > >> Hi Luis, > > >> > > >> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. > > >> > > >> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. > > >> > > >> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: > > >> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. > > >> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? > > >> 3) Open the DB in a SQLite tool and do something like this: > > >> > > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > >> > > >> Is it in the DB? > > >> > > >> Thanks! > > >> > > >> brian > > >> > > >> > > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > > >> > > >>> Hi, > > >>> > > >>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > > >>> > > >>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > > >>> > > >>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > > >>> > > >>> number of files / default load_db time / patched load_db time > > >>> ~80.000 / 20min / 2min > > >>> ~300.000 / 3h / 7min > > >>> ~700.000 / 48h / 27min > > >>> > > >>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > > >>> ------------------------------------------------------------------------------ > > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > >>> unparalleled scalability from the best Selenium testing platform available. > > >>> Simple to use. Nothing to install. Get started now for free." > > >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > >>> sleuthkit-users mailing list > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >>> http://www.sleuthkit.org > > >> > > >> > > >> ------------------------------------------------------------------------------ > > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > >> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > >> unparalleled scalability from the best Selenium testing platform available. > > >> Simple to use. Nothing to install. Get started now for free." > > >> http://p.sf.net/sfu/SauceLabs_______________________________________________ > > >> sleuthkit-users mailing list > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >> http://www.sleuthkit.org > > > > > > > > > ------------------------------------------------------------------------------ > > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > unparalleled scalability from the best Selenium testing platform available. > > > Simple to use. Nothing to install. Get started now for free." > > > http://p.sf.net/sfu/SauceLabs > > > _______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > > • 3 signs your SCM is hindering your productivity > > • Requirements for releasing software faster > > • Expert tips and advice for migrating your SCM now > > http://p.sf.net/sfu/perforce_______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Jason L. <jle...@ba...> - 2014-05-12 13:17:20
|
Hi Enkidu - You can develop a module for Autopsy, which will save you a lot of effort from starting an entirely new tool. This page should get you started: http://wiki.sleuthkit.org/index.php?title=Autopsy_Developer%27s_Guide Also, we're running a student development contest for Autopsy modules to try and get some more community involvement and encourage more module additions to the open source community. You can find more information about that here: http://www.basistech.com/digital-forensics/autopsy/autopsy-for-educators/student-development-contest/ Jason ------------------------------------------------ Jason Letourneau Product Manager, Digital Forensics Basis Technology jle...@ba... 617-386-2000 ext. 152 On May 12, 2014, at 3:34 AM, Enkidu Mo Shiri <vol...@gm...> wrote: > Hi, > for my project, i have to create a tool to investigate and find evidences of bitcoin,litecoin,dogecoin wallets on windows,android and ios. is it better and easier if i add this function to autopsy source code, or i create a new tool? > thank you > Ehsan Moshiri (Enkidu) > Digital Forensic Student > H/P:+96164953954 , +961124249769 > Linkedin: http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/ > Facebook: Enkidu Mo Shi Ri > wechat: Enkidu-Moshiri > Line: Enkidu.Moshiri > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing platform available > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Enkidu Mo S. <vol...@gm...> - 2014-05-12 07:34:07
|
Hi, for my project, i have to create a tool to investigate and find evidences of bitcoin,litecoin,dogecoin wallets on windows,android and ios. is it better and easier if i add this function to autopsy source code, or i create a new tool? thank you *Ehsan Moshiri (Enkidu)* *Digital Forensic Student* *H/P:+96164953954 , +961124249769* *Linkedin: http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/ <http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/>* *Facebook: Enkidu Mo Shi Ri* *wechat: Enkidu-Moshiri* *Line: Enkidu.Moshiri* |
From: Luís F. N. <lfc...@gm...> - 2014-05-08 13:57:54
|
Hi Brian and Simson, I have done a number of tests since yesterday. I have not restarted the computer between the tests, because i think it would be better to use the OS IO cache to focus on CPU processing time, without io interference, except for the first run. I used another computer, faster than before. Results below: ntfs image w/ 127.408 files 1 2 3 4 no patch (only Brian's fix) 3min 27s 3min 3s 3min 3s 3min 2s disabling database parent_id look up 35s 11s 11s 12s index on meta_addr,fs_obj_id and commit for each 5000 files 4min 11s 3min 48s 3min 48s 3min 47s ntfs image w/ 216.693 files 1 2 3 4 no patch (only Brian's fix) 5min 53s 4min 59s 4min 58s 5min 2s disabling database parent_id look up 2min 8s 1min 21s 1min 21s 1min 21s index on meta_addr,fs_obj_id and commit for each 5000 files 6min 38s 5min 46s 5min 43s 5min 43s ntfs image w/ 433.321 files 1 2 3 4 no patch (only Brian's fix) 21min 38s 21min 40s 21min 10s 21min 10s disabling database parent_id look up 3min 59s 2min 47s 2min 47s 2min 47s index on meta_addr,fs_obj_id and commit for each 5000 files (not run based on previous results) So, Brian was right, the commits increased processing time. And as you can see, it would be great if it is possible to eliminate the misses and remove the database parent_id look up. With that in mind, I took a look at one of the sqlites, and I think I discovered the cause of a lot of the misses (maybe all of them). The misses happened with deleted files inside deleted folders. I think deleted folders can not being stored into local cache. Regards, Nassif 2014-05-07 10:53 GMT-03:00 Brian Carrier <ca...@sl...>: > Hi Nassif, > > As Simson mentioned, the current setup was intended to be the fastest. > Doing frequent commits takes longer and more indexes makes commits take > longer. This is the only process that we know about that does this type of > lookup and would use those indexes. > > The bigger question for me is why we are getting these cache misses and I > need to spend some more time with some more images to find out. The lookup > is to find the ID of the parent and we process from the root directory > down. So, in theory, we have already processed the parent folder before > the children and it should be in the cache. We need to figure out why the > parent isn't in the cache... > > brian > > > > On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > I have done one last test, because it was very strange to me that > indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We > suspected that the indexes were not being used by sqlite, maybe because the > whole data is not commited before add image process finishes (i am not a > sqlite expert, is it possible?). So we inserted a commit for each 5.000 > files added to database. The add image process time decreased from 1hour to > 30min, so we think that the indexes were not being used. > > > > Why add image process do not commit the data while it is being added to > database? > > > > Nassif > > > > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > Fixing my last email, the test was run with the indexes AND Brian's fix. > Then I removed the index patch and loadDb took the same 1 hour to finish > with only Brian's fix. So the index patch did not help improving database > look up for parent_id. > > > > Sorry for mistake, > > Nassif > > > > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. > The image with 433.321 files, previously taking 2h45min to load, now takes > 1h to finish loadDb with the indexes. That is a good speed up, but > completely disabling the database parent_id look up, it only takes 7min to > finish. Is there another thing we can do to improve the parent_id database > look up? > > > > Regards, > > Nassif > > > > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > > > Ok, tested in 2 images. Fix resolved a lot of misses: > > > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > > > I also think creating an index on tsk_files(meta_addr) and > tsk_files(fs_obj_id) could help improving the database look up for those > deleted files not found in local cache, what do you think? The database > look up seems too slow, as described in my first email. > > > > Thank you for taking a look so quickly. > > Nassif > > > > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > > > Well that was an easy and embarrassing fix: > > > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > > - seq = fs_file->name->meta_seq; > > + seq = fs_file->name->par_seq; > > } > > > > Turns out we've been having a lot of cache misses because of this > stupid bug. Can you replace that line and see if it helps. It certainly > did on my test image. > > > > thanks, > > brian > > > > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> > wrote: > > > > > Thanks for the tests. I wonder if it has to do with an incorrect > sequence number. NTFS increments the sequence number each time a file is > re-allocated. Deleted orphan files could be getting misses. I'll add some > logging on my system and see what kind of misses I get. > > > > > > brian > > > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, > so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() > to return the parent_meta_addr when it is not found and return 1 when it is > found in the cache map. > > >> > > >> Performing queries on the generated sqlite, there were 19.558 cache > misses from an image with 3 ntfs partitions and 127.408 files. I confirmed > that many parent_meta_addr missed from cache (now stored in > tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths > corresponding to these meta_addr are parents of those files whose > processing have not found them in cache. > > >> > > >> Other tests resulted in: > > >> 182.256 cache misses from 433.321 files (ntfs) > > >> 892.359 misses from 1.811.393 files (ntfs) > > >> 169.819 misses from 3.177.917 files (hfs) > > >> > > >> Luis Nassif > > >> > > >> > > >> > > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > >> Forgot to mention: we are using sleuthkit 4.1.3 > > >> > > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> > escreveu: > > >> > > >> Hi Brian, > > >> > > >> The 3 cases above were ntfs. I also tested with hfs and canceled > loaddb after 1 day. The modified version finished after 8hours and added > about 3 million entries. We will try to do the tests you have suggested. > > >> > > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> > escreveu: > > >> Hi Luis, > > >> > > >> What kind of file system was it? I fixed a bug a little while ago in > that code for HFS file systems that resulted in a lot of cache misses. > > >> > > >> In theory, everything should be cached. It sounds like a bug if you > are getting so many misses. The basic idea of this code is that everything > in the DB gets assigned a unique object ID and we make associations between > files and their parent folder's unique ID. > > >> > > >> Since you seem to be comfortable with a debugger in the code, can you > set a breakpoint for when the miss happens and: > > >> 1) Determine the path of the file that was being added to the DB and > the parent address that was trying to be found. > > >> 2) Use the 'ffind' TSK tool to then map that parent address to a > path. Is it a subset of the path from #1? > > >> 3) Open the DB in a SQLite tool and do something like this: > > >> > > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > > >> > > >> Is it in the DB? > > >> > > >> Thanks! > > >> > > >> brian > > >> > > >> > > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > >> > > >>> Hi, > > >>> > > >>> We have investigated a bit why the add image process is too slow in > some cases. The add image process time seems to be quadratic with the > number of files in the image. > > >>> > > >>> We detected that the function TskDbSqlite::findParObjId(), in > db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id > mapping in the local cache for a lot of files, causing it to search for the > mapping in the database (not sure if it is an non-indexed search?) > > >>> > > >>> For testing purposes, we added a "return 1;" line right after the > cache look up, disabling the database look up, and this resulted in great > speed ups: > > >>> > > >>> number of files / default load_db time / patched load_db time > > >>> ~80.000 / 20min / 2min > > >>> ~300.000 / 3h / 7min > > >>> ~700.000 / 48h / 27min > > >>> > > >>> We wonder if it is possible to store all par_meta_addr -> par_id > mappings into local cache (better) or doing an improved (indexed?) search > for the mapping in the database. We think that someone with more knowledge > of load_db code could help a lot here. > > >>> > ------------------------------------------------------------------------------ > > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For > FREE > > >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > >>> unparalleled scalability from the best Selenium testing platform > available. > > >>> Simple to use. Nothing to install. Get started now for free." > > >>> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > > >>> sleuthkit-users mailing list > > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >>> http://www.sleuthkit.org > > >> > > >> > > >> > ------------------------------------------------------------------------------ > > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > >> Instantly run your Selenium tests across 300+ browser/OS combos. Get > > >> unparalleled scalability from the best Selenium testing platform > available. > > >> Simple to use. Nothing to install. Get started now for free." > > >> > http://p.sf.net/sfu/SauceLabs_______________________________________________ > > >> sleuthkit-users mailing list > > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > >> http://www.sleuthkit.org > > > > > > > > > > ------------------------------------------------------------------------------ > > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > > unparalleled scalability from the best Selenium testing platform > available. > > > Simple to use. Nothing to install. Get started now for free." > > > http://p.sf.net/sfu/SauceLabs > > > _______________________________________________ > > > sleuthkit-users mailing list > > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > > http://www.sleuthkit.org > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > Is your legacy SCM system holding you back? Join Perforce May 7 to find > out: > > • 3 signs your SCM is hindering your productivity > > • Requirements for releasing software faster > > • Expert tips and advice for migrating your SCM now > > > http://p.sf.net/sfu/perforce_______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > |
From: Kalin K. <me....@gm...> - 2014-05-07 14:39:23
|
On Wed, May 7, 2014 at 10:53 PM, Brian Carrier <ca...@sl...> wrote: > The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... > Just a wild guess... NTFS "oddities" like reparse points, etc.? Kalin. |
From: Brian C. <ca...@sl...> - 2014-05-07 13:53:40
|
Hi Nassif, As Simson mentioned, the current setup was intended to be the fastest. Doing frequent commits takes longer and more indexes makes commits take longer. This is the only process that we know about that does this type of lookup and would use those indexes. The bigger question for me is why we are getting these cache misses and I need to spend some more time with some more images to find out. The lookup is to find the ID of the parent and we process from the root directory down. So, in theory, we have already processed the parent folder before the children and it should be in the cache. We need to figure out why the parent isn't in the cache... brian On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. > > Why add image process do not commit the data while it is being added to database? > > Nassif > > > 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > Fixing my last email, the test was run with the indexes AND Brian's fix. Then I removed the index patch and loadDb took the same 1 hour to finish with only Brian's fix. So the index patch did not help improving database look up for parent_id. > > Sorry for mistake, > Nassif > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. The image with 433.321 files, previously taking 2h45min to load, now takes 1h to finish loadDb with the indexes. That is a good speed up, but completely disabling the database parent_id look up, it only takes 7min to finish. Is there another thing we can do to improve the parent_id database look up? > > Regards, > Nassif > > > 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > Ok, tested in 2 images. Fix resolved a lot of misses: > > ntfs image w/ 127.408 files: from 19.558 to 6.511 misses > ntfs image w/ 433.321 files: from 182.256 to 19.908 misses > > I also think creating an index on tsk_files(meta_addr) and tsk_files(fs_obj_id) could help improving the database look up for those deleted files not found in local cache, what do you think? The database look up seems too slow, as described in my first email. > > Thank you for taking a look so quickly. > Nassif > > > 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: > > Well that was an easy and embarrassing fix: > > if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { > - seq = fs_file->name->meta_seq; > + seq = fs_file->name->par_seq; > } > > Turns out we've been having a lot of cache misses because of this stupid bug. Can you replace that line and see if it helps. It certainly did on my test image. > > thanks, > brian > > > On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> wrote: > > > Thanks for the tests. I wonder if it has to do with an incorrect sequence number. NTFS increments the sequence number each time a file is re-allocated. Deleted orphan files could be getting misses. I'll add some logging on my system and see what kind of misses I get. > > > > brian > > > > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> wrote: > > > >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() to return the parent_meta_addr when it is not found and return 1 when it is found in the cache map. > >> > >> Performing queries on the generated sqlite, there were 19.558 cache misses from an image with 3 ntfs partitions and 127.408 files. I confirmed that many parent_meta_addr missed from cache (now stored in tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths corresponding to these meta_addr are parents of those files whose processing have not found them in cache. > >> > >> Other tests resulted in: > >> 182.256 cache misses from 433.321 files (ntfs) > >> 892.359 misses from 1.811.393 files (ntfs) > >> 169.819 misses from 3.177.917 files (hfs) > >> > >> Luis Nassif > >> > >> > >> > >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > >> Forgot to mention: we are using sleuthkit 4.1.3 > >> > >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> escreveu: > >> > >> Hi Brian, > >> > >> The 3 cases above were ntfs. I also tested with hfs and canceled loaddb after 1 day. The modified version finished after 8hours and added about 3 million entries. We will try to do the tests you have suggested. > >> > >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> escreveu: > >> Hi Luis, > >> > >> What kind of file system was it? I fixed a bug a little while ago in that code for HFS file systems that resulted in a lot of cache misses. > >> > >> In theory, everything should be cached. It sounds like a bug if you are getting so many misses. The basic idea of this code is that everything in the DB gets assigned a unique object ID and we make associations between files and their parent folder's unique ID. > >> > >> Since you seem to be comfortable with a debugger in the code, can you set a breakpoint for when the miss happens and: > >> 1) Determine the path of the file that was being added to the DB and the parent address that was trying to be found. > >> 2) Use the 'ffind' TSK tool to then map that parent address to a path. Is it a subset of the path from #1? > >> 3) Open the DB in a SQLite tool and do something like this: > >> > >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE > >> > >> Is it in the DB? > >> > >> Thanks! > >> > >> brian > >> > >> > >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > >> > >>> Hi, > >>> > >>> We have investigated a bit why the add image process is too slow in some cases. The add image process time seems to be quadratic with the number of files in the image. > >>> > >>> We detected that the function TskDbSqlite::findParObjId(), in db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id mapping in the local cache for a lot of files, causing it to search for the mapping in the database (not sure if it is an non-indexed search?) > >>> > >>> For testing purposes, we added a "return 1;" line right after the cache look up, disabling the database look up, and this resulted in great speed ups: > >>> > >>> number of files / default load_db time / patched load_db time > >>> ~80.000 / 20min / 2min > >>> ~300.000 / 3h / 7min > >>> ~700.000 / 48h / 27min > >>> > >>> We wonder if it is possible to store all par_meta_addr -> par_id mappings into local cache (better) or doing an improved (indexed?) search for the mapping in the database. We think that someone with more knowledge of load_db code could help a lot here. > >>> ------------------------------------------------------------------------------ > >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > >>> Instantly run your Selenium tests across 300+ browser/OS combos. Get > >>> unparalleled scalability from the best Selenium testing platform available. > >>> Simple to use. Nothing to install. Get started now for free." > >>> http://p.sf.net/sfu/SauceLabs_______________________________________________ > >>> sleuthkit-users mailing list > >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >>> http://www.sleuthkit.org > >> > >> > >> ------------------------------------------------------------------------------ > >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > >> Instantly run your Selenium tests across 300+ browser/OS combos. Get > >> unparalleled scalability from the best Selenium testing platform available. > >> Simple to use. Nothing to install. Get started now for free." > >> http://p.sf.net/sfu/SauceLabs_______________________________________________ > >> sleuthkit-users mailing list > >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > >> http://www.sleuthkit.org > > > > > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > > Instantly run your Selenium tests across 300+ browser/OS combos. Get > > unparalleled scalability from the best Selenium testing platform available. > > Simple to use. Nothing to install. Get started now for free." > > http://p.sf.net/sfu/SauceLabs > > _______________________________________________ > > sleuthkit-users mailing list > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > > http://www.sleuthkit.org > > > > > > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce_______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Simson G. <si...@ac...> - 2014-05-07 13:11:52
|
Hi, Nassif. Could you repeat your experiment a few times and on different media? Please be sure to reboot your computer between each test. All of the data are inserted in a single commit because the developers believed that this was the fastest way to add the data. If this is not the case, we would like to know. Thanks! Simson On May 7, 2014, at 8:24 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. > > Why add image process do not commit the data while it is being added to database? > > Nassif |
From: Luís F. N. <lfc...@gm...> - 2014-05-07 12:24:10
|
I have done one last test, because it was very strange to me that indexing meta_addr and fs_obj_id had not improved the parent_id lookup. We suspected that the indexes were not being used by sqlite, maybe because the whole data is not commited before add image process finishes (i am not a sqlite expert, is it possible?). So we inserted a commit for each 5.000 files added to database. The add image process time decreased from 1hour to 30min, so we think that the indexes were not being used. Why add image process do not commit the data while it is being added to database? Nassif 2014-05-02 13:37 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > Fixing my last email, the test was run with the indexes AND Brian's fix. > Then I removed the index patch and loadDb took the same 1 hour to finish > with only Brian's fix. So the index patch did not help improving database > look up for parent_id. > > Sorry for mistake, > Nassif > > > 2014-05-02 10:54 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > > I tested loadDb with a create index on meta_addr and fs_obj_id patch. The >> image with 433.321 files, previously taking 2h45min to load, now takes 1h >> to finish loadDb with the indexes. That is a good speed up, but completely >> disabling the database parent_id look up, it only takes 7min to finish. Is >> there another thing we can do to improve the parent_id database look up? >> >> Regards, >> Nassif >> >> >> 2014-05-02 9:35 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >> >> Ok, tested in 2 images. Fix resolved a lot of misses: >>> >>> ntfs image w/ 127.408 files: from 19.558 to 6.511 misses >>> ntfs image w/ 433.321 files: from 182.256 to 19.908 misses >>> >>> I also think creating an index on tsk_files(meta_addr) and >>> tsk_files(fs_obj_id) could help improving the database look up for those >>> deleted files not found in local cache, what do you think? The database >>> look up seems too slow, as described in my first email. >>> >>> Thank you for taking a look so quickly. >>> Nassif >>> >>> >>> 2014-05-01 23:47 GMT-03:00 Brian Carrier <ca...@sl...>: >>> >>> Well that was an easy and embarrassing fix: >>>> >>>> if (TSK_FS_TYPE_ISNTFS(fs_file->fs_info->ftype)) { >>>> - seq = fs_file->name->meta_seq; >>>> + seq = fs_file->name->par_seq; >>>> } >>>> >>>> Turns out we've been having a lot of cache misses because of this >>>> stupid bug. Can you replace that line and see if it helps. It certainly >>>> did on my test image. >>>> >>>> thanks, >>>> brian >>>> >>>> >>>> On May 1, 2014, at 10:24 PM, Brian Carrier <ca...@sl...> >>>> wrote: >>>> >>>> > Thanks for the tests. I wonder if it has to do with an incorrect >>>> sequence number. NTFS increments the sequence number each time a file is >>>> re-allocated. Deleted orphan files could be getting misses. I'll add some >>>> logging on my system and see what kind of misses I get. >>>> > >>>> > brian >>>> > >>>> > On May 1, 2014, at 8:39 PM, Luís Filipe Nassif <lfc...@gm...> >>>> wrote: >>>> > >>>> >> Ok, tests 1 and 3 done. I do not have sleuthkit code inside an ide, >>>> so did not use breakpoints. Instead, I changed TskDbSqlite::findParObjId() >>>> to return the parent_meta_addr when it is not found and return 1 when it is >>>> found in the cache map. >>>> >> >>>> >> Performing queries on the generated sqlite, there were 19.558 cache >>>> misses from an image with 3 ntfs partitions and 127.408 files. I confirmed >>>> that many parent_meta_addr missed from cache (now stored in >>>> tsk_objects.par_obj_id) are into tsk_files.meta_addr. The complete paths >>>> corresponding to these meta_addr are parents of those files whose >>>> processing have not found them in cache. >>>> >> >>>> >> Other tests resulted in: >>>> >> 182.256 cache misses from 433.321 files (ntfs) >>>> >> 892.359 misses from 1.811.393 files (ntfs) >>>> >> 169.819 misses from 3.177.917 files (hfs) >>>> >> >>>> >> Luis Nassif >>>> >> >>>> >> >>>> >> >>>> >> 2014-05-01 16:14 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: >>>> >> Forgot to mention: we are using sleuthkit 4.1.3 >>>> >> >>>> >> Em 01/05/2014 16:09, "Luís Filipe Nassif" <lfc...@gm...> >>>> escreveu: >>>> >> >>>> >> Hi Brian, >>>> >> >>>> >> The 3 cases above were ntfs. I also tested with hfs and canceled >>>> loaddb after 1 day. The modified version finished after 8hours and added >>>> about 3 million entries. We will try to do the tests you have suggested. >>>> >> >>>> >> Em 01/05/2014 15:48, "Brian Carrier" <ca...@sl...> >>>> escreveu: >>>> >> Hi Luis, >>>> >> >>>> >> What kind of file system was it? I fixed a bug a little while ago in >>>> that code for HFS file systems that resulted in a lot of cache misses. >>>> >> >>>> >> In theory, everything should be cached. It sounds like a bug if you >>>> are getting so many misses. The basic idea of this code is that everything >>>> in the DB gets assigned a unique object ID and we make associations between >>>> files and their parent folder's unique ID. >>>> >> >>>> >> Since you seem to be comfortable with a debugger in the code, can >>>> you set a breakpoint for when the miss happens and: >>>> >> 1) Determine the path of the file that was being added to the DB and >>>> the parent address that was trying to be found. >>>> >> 2) Use the 'ffind' TSK tool to then map that parent address to a >>>> path. Is it a subset of the path from #1? >>>> >> 3) Open the DB in a SQLite tool and do something like this: >>>> >> >>>> >> SELECT * from tsk_files where meta_addr == META_ADDR_FROM_ABOVE >>>> >> >>>> >> Is it in the DB? >>>> >> >>>> >> Thanks! >>>> >> >>>> >> brian >>>> >> >>>> >> >>>> >> On May 1, 2014, at 11:58 AM, Luís Filipe Nassif <lfc...@gm...> >>>> wrote: >>>> >> >>>> >>> Hi, >>>> >>> >>>> >>> We have investigated a bit why the add image process is too slow in >>>> some cases. The add image process time seems to be quadratic with the >>>> number of files in the image. >>>> >>> >>>> >>> We detected that the function TskDbSqlite::findParObjId(), in >>>> db_sqlite.cpp, is not finding the parent_meta_addr -> parent_file_id >>>> mapping in the local cache for a lot of files, causing it to search for the >>>> mapping in the database (not sure if it is an non-indexed search?) >>>> >>> >>>> >>> For testing purposes, we added a "return 1;" line right after the >>>> cache look up, disabling the database look up, and this resulted in great >>>> speed ups: >>>> >>> >>>> >>> number of files / default load_db time / patched load_db time >>>> >>> ~80.000 / 20min / 2min >>>> >>> ~300.000 / 3h / 7min >>>> >>> ~700.000 / 48h / 27min >>>> >>> >>>> >>> We wonder if it is possible to store all par_meta_addr -> par_id >>>> mappings into local cache (better) or doing an improved (indexed?) search >>>> for the mapping in the database. We think that someone with more knowledge >>>> of load_db code could help a lot here. >>>> >>> >>>> ------------------------------------------------------------------------------ >>>> >>> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >>>> FREE >>>> >>> Instantly run your Selenium tests across 300+ browser/OS combos. >>>> Get >>>> >>> unparalleled scalability from the best Selenium testing platform >>>> available. >>>> >>> Simple to use. Nothing to install. Get started now for free." >>>> >>> >>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>> >>> sleuthkit-users mailing list >>>> >>> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> >>> http://www.sleuthkit.org >>>> >> >>>> >> >>>> >> >>>> ------------------------------------------------------------------------------ >>>> >> "Accelerate Dev Cycles with Automated Cross-Browser Testing - For >>>> FREE >>>> >> Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>> >> unparalleled scalability from the best Selenium testing platform >>>> available. >>>> >> Simple to use. Nothing to install. Get started now for free." >>>> >> >>>> http://p.sf.net/sfu/SauceLabs_______________________________________________ >>>> >> sleuthkit-users mailing list >>>> >> https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> >> http://www.sleuthkit.org >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE >>>> > Instantly run your Selenium tests across 300+ browser/OS combos. Get >>>> > unparalleled scalability from the best Selenium testing platform >>>> available. >>>> > Simple to use. Nothing to install. Get started now for free." >>>> > http://p.sf.net/sfu/SauceLabs >>>> > _______________________________________________ >>>> > sleuthkit-users mailing list >>>> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users >>>> > http://www.sleuthkit.org >>>> >>>> >>> >> > |
From: Brian C. <ca...@sl...> - 2014-05-06 00:44:10
|
Hi Anton, github issues and pull requests are the easiest at this point. I should close the sourceforge trackers to new posts and redirect to github. We've merged several of your pull requests over the past months. Thanks for those. I saw that there was a bug reported a little while ago about long LFN. Like many open source projects, the developers who contribute to TSK/Autopsy are either volunteering their time or working as part of a project that uses them. The reality of that is that random bug reports will eventually get looked at, but the time that it takes to look at it depends on many circumstances (and depends on the frequency and severity of the issue). I usually review the list of bugs and such before we do a release to see if there is something that should be addressed. We're due for a 4.2 release in the next couple of weeks and was going to merge in a bunch of the patches out there. In terms of how long it takes for pull requests to get merged, the easier the patch, the faster it gets reviewed and merged. The ones that hang out the longest are because they have a bunch of changes that require more thought. thanks, brian On May 5, 2014, at 4:40 AM, Anton Kukoba <ak...@ad...> wrote: > We're actively using sleuth kit source code and tools, so we're finding > issues quite often. Sometimes we know how to fix the problem and we have a > patch for the source code. Sometimes we don't, but we want some feedback > about the bug we found, to know if it will be fixed by sleuth kit team, or > if we should try investigate on our own. > I've tried creating the tickets at sourceforge, and it seems like it's no > good. Also I've tried to create pull requests when I had a patch, this is > unreliable also, because sometimes pull requests are hanging for weeks. > So, I've decided to ask here... > > -- > Anton Kukoba - Technical Leader > ADF Solutions, Inc. > > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Brad S. <bra...@gm...> - 2014-05-05 19:02:19
|
Version: sleuthkit-4.1.3-win32 OS: Windows 7 x64 SP1 I am collecting slack via blkls -s, however I would like split the slack up by either Inode or Block where it was collected. Going through the source code, I can tell that this would be a relatively trivial task since the blocks are collected anyways, but I am not well versed in C++. Any help would be appreciated. |
From: L. G. M. 'P. <po...@lg...> - 2014-05-05 10:47:29
|
Hi It's not easy to run iOS in a PC. You can try running a Mac with OS X plus Xcode with its iOS simulator. But still you cannot run regular apps there - only the stock ones, and the ones you develop yourself. But still, you may find it useful. Best regards Pope > El 05/05/2014, a las 12:05, Enkidu Mo Shiri <vol...@gm...> escribió: > > Hi everyone, > for my project which is about bitcoin forensics, i should work on 3 platforms, android,windows and ios. im creating tool for handphones,so i coem with hp version of these operating systems. i did few research about how to install iphone os on a pc virtual machine and this link is the best i found: http://maconwindows.blogspot.com/ > which is installing lion os first then use xcode > anybody have better idea to how to run iphone ios7 in virtual machine on pc? would be very appreciated > thank you > Ehsan Moshiri (Enkidu) > Digital Forensic Student > H/P:+96164953954 , +961124249769 > Linkedin: http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/ > Facebook: Enkidu Mo Shi Ri > wechat: Enkidu-Moshiri > Line: Enkidu.Moshiri > ------------------------------------------------------------------------------ > Is your legacy SCM system holding you back? Join Perforce May 7 to find out: > • 3 signs your SCM is hindering your productivity > • Requirements for releasing software faster > • Expert tips and advice for migrating your SCM now > http://p.sf.net/sfu/perforce > _______________________________________________ > sleuthkit-users mailing list > https://lists.sourceforge.net/lists/listinfo/sleuthkit-users > http://www.sleuthkit.org |
From: Enkidu Mo S. <vol...@gm...> - 2014-05-05 10:05:16
|
Hi everyone, for my project which is about bitcoin forensics, i should work on 3 platforms, android,windows and ios. im creating tool for handphones,so i coem with hp version of these operating systems. i did few research about how to install iphone os on a pc virtual machine and this link is the best i found: http://maconwindows.blogspot.com/ which is installing lion os first then use xcode anybody have better idea to how to run iphone ios7 in virtual machine on pc? would be very appreciated thank you *Ehsan Moshiri (Enkidu)* *Digital Forensic Student* *H/P:+96164953954 , +961124249769* *Linkedin: http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/ <http://my.linkedin.com/pub/enkidu-moshiri/59/baa/90b/>* *Facebook: Enkidu Mo Shi Ri* *wechat: Enkidu-Moshiri* *Line: Enkidu.Moshiri* |
From: Anton K. <ak...@ad...> - 2014-05-05 09:41:08
|
We're actively using sleuth kit source code and tools, so we're finding issues quite often. Sometimes we know how to fix the problem and we have a patch for the source code. Sometimes we don't, but we want some feedback about the bug we found, to know if it will be fixed by sleuth kit team, or if we should try investigate on our own. I've tried creating the tickets at sourceforge, and it seems like it's no good. Also I've tried to create pull requests when I had a patch, this is unreliable also, because sometimes pull requests are hanging for weeks. So, I've decided to ask here... -- Anton Kukoba - Technical Leader ADF Solutions, Inc. |
From: <al...@ma...> - 2014-05-02 17:06:19
|
Output is: 12 drwxrwxrwt root root ... > On May 2, 2014 6:56 PM, <al...@ma...> wrote: >> Both were downloaded from the Ubuntu software repository and installed >> correctly, in the order SK then autopsy. >> >> I ran Autopsy from the Terminal, and got the message about opening an >> HTML >> session but then I got the error message (immediately): >> >> "Cannot open log: autopsy.log at /usr/share/autopsy/lib//Print.pm at >> line >> 383." >> > Most probably permission problem, most probably in /tmp ... > > What is the output of `ls -lsd /tmp` ? > > Note to devs: The message need to be changed to include the full path > IMHO. > > Kalin. > |