sleuthkit-developers Mailing List for The Sleuth Kit (Page 8)
Brought to you by:
carrier
You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(10) |
Sep
(2) |
Oct
|
Nov
(1) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(22) |
Feb
(39) |
Mar
(8) |
Apr
(17) |
May
(10) |
Jun
(2) |
Jul
(6) |
Aug
(4) |
Sep
(1) |
Oct
(3) |
Nov
|
Dec
|
2005 |
Jan
(2) |
Feb
(6) |
Mar
(2) |
Apr
(2) |
May
(13) |
Jun
(2) |
Jul
|
Aug
|
Sep
(5) |
Oct
|
Nov
(2) |
Dec
|
2006 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(2) |
Jun
(9) |
Jul
(4) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
(9) |
Dec
(4) |
2007 |
Jan
(1) |
Feb
(2) |
Mar
|
Apr
(3) |
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(4) |
Oct
|
Nov
|
Dec
(2) |
2008 |
Jan
(4) |
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
(9) |
Jul
(14) |
Aug
|
Sep
(5) |
Oct
(10) |
Nov
(4) |
Dec
(7) |
2009 |
Jan
(7) |
Feb
(10) |
Mar
(10) |
Apr
(19) |
May
(16) |
Jun
(3) |
Jul
(9) |
Aug
(5) |
Sep
(5) |
Oct
(16) |
Nov
(35) |
Dec
(30) |
2010 |
Jan
(4) |
Feb
(24) |
Mar
(25) |
Apr
(31) |
May
(11) |
Jun
(9) |
Jul
(11) |
Aug
(31) |
Sep
(11) |
Oct
(10) |
Nov
(15) |
Dec
(3) |
2011 |
Jan
(8) |
Feb
(17) |
Mar
(14) |
Apr
(2) |
May
(4) |
Jun
(4) |
Jul
(3) |
Aug
(7) |
Sep
(18) |
Oct
(8) |
Nov
(16) |
Dec
(1) |
2012 |
Jan
(9) |
Feb
(2) |
Mar
(3) |
Apr
(13) |
May
(10) |
Jun
(7) |
Jul
(1) |
Aug
(5) |
Sep
|
Oct
(3) |
Nov
(19) |
Dec
(3) |
2013 |
Jan
(16) |
Feb
(3) |
Mar
(2) |
Apr
(4) |
May
|
Jun
(3) |
Jul
(2) |
Aug
(17) |
Sep
(6) |
Oct
(1) |
Nov
|
Dec
(4) |
2014 |
Jan
(2) |
Feb
|
Mar
(3) |
Apr
(7) |
May
(6) |
Jun
(1) |
Jul
(18) |
Aug
|
Sep
(3) |
Oct
(1) |
Nov
(26) |
Dec
(7) |
2015 |
Jan
(5) |
Feb
(1) |
Mar
(2) |
Apr
|
May
(1) |
Jun
(1) |
Jul
(5) |
Aug
(7) |
Sep
(4) |
Oct
(1) |
Nov
(1) |
Dec
|
2016 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
|
May
(1) |
Jun
(13) |
Jul
(23) |
Aug
(2) |
Sep
(11) |
Oct
|
Nov
(1) |
Dec
|
2017 |
Jan
(4) |
Feb
|
Mar
|
Apr
(2) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(2) |
Dec
|
2018 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
(1) |
Jun
(3) |
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
|
Nov
|
Dec
|
2020 |
Jan
(4) |
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(5) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
|
2021 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2024 |
Jan
|
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(1) |
From: Brian C. <ca...@sl...> - 2014-05-13 18:50:21
|
I wanted to give developers a heads up that your Autopsy modules will stop working with the 3.1 release. 3.1 has multiple file ingest pipelines running in parallel and we had to change the module API to handle that. Essentially, we make a new instance of the module for each pipeline so that don't have to worry about synchronization and such unless you take additional steps to have static variables that will be shared across threads. Ingest modules now have two interfaces to implement. One is a factory that has methods to produce the (optional) gui configuration panels and instances of the actual analysis modules. The analysis module interface has the usual startup, process, and shutdown methods. A little renaming was done while we were ripping everything out. let us know if you have any questions about the migration. Updated docs can be found here on the new design and notes on migrating versions: http://sleuthkit.org/autopsy/docs/api-docs/3.1/mod_ingest_page.html 3.1 should be released in a couple of weeks. It's on the develop branch on github if you want to play with it now though. thanks, brian |
From: Brian C. <ca...@sl...> - 2014-05-13 14:06:22
|
Basis Technology is hosting another Autopsy module writing competition around OSDFCon. Submissions can be entirely new analysis techniques or wrappers around existing tools. A primary goal of Autopsy 3 was to be a digital forensics platform so that users could use a single tool to perform their investigations and not waste time moving data around between various stand-alone tools. We've built the infrastructure and now we need developers to write modules so that we can all achieve the original goal from OSDFCon 2010. Last year, the winners developed registry parsing and fuzzy hashing modules. This year, we're looking for even better submissions. OSDFCon attendees will vote on who gets the cash prizes. Submissions are due Oct 20, 2014. Rules are available on the website: http://www.basistech.com/osdfcon-contest/ If you are looking for ideas, the above site has a link to a set of feature requests that have been submitted. Note that we also sponsored a student-based competition this year too. This competition is different. It has bigger prizes and is timed based on OSDFCon and not semesters (http://www.basistech.com/digital-forensics/autopsy/autopsy-for-educators/student-development-contest/). thanks, brian |
From: Stefan P. <ste...@gm...> - 2014-05-06 01:09:29
|
Re-sending a smaller data.tar.gz containing the files mentioned in this thread. On Tue, May 06, 2014 at 04:04:08AM +0300, Stefan Petrea wrote: > Patch attached > > On Tue, May 06, 2014 at 04:01:42AM +0300, Stefan Petrea wrote: > > Sending a new patch. Forgot to add a line in the previous one. > > I've opened a github pull request for this here https://github.com/sleuthkit/sleuthkit/pull/329 > > > > On Tue, May 06, 2014 at 03:41:09AM +0300, Stefan Petrea wrote: > > > Hi, > > > > > > I'm using Sleuthkit and encountered a memory leak in TskFsDir::getFile. On a 20GB disk image, > > > with ~10k files this leads to unallocated memory of ~2.5GB according to Valgrind. > > > > > > I'm writing to the mailing list to provide details on the leak and a patch for it. > > > > > > I wrote a small testcase that reproduces the leak in isolated.cpp > > > > > > The output of Valgrind before the patch was applied( commit a9e2aa0e39cd8eeffd4cccf951b7b91a40f5e8c0 ), > > > can be found in valgrind-output-before-patch.txt and the output of Valgrind after the patch > > > was applied can be found in valgrind-output-after-patch.txt > > > > > > The leaks reported by Valgrind all originate in TskFsDir::getFile. The unfreed objects > > > were all of type TskFsFile. > > > > > > The script find-leak.py automates GDB 7.6 and is designed to automatically debug isolated.cpp and > > > at the same time collect information about the callgraph and pointers that cause the leak > > > (Note: GDB 7.7 has a different Python API). > > > > > > find-leak.py keeps track of all the pointers allocated through tsk_fs_file_alloc and all > > > the pointers deallocated through tsk_fs_file_close which can be seen in the callgraph > > > gdb-all-paths.png > > > > > > The aim of this script (find-leak.py ) is to track the unfreed objects and find the call paths that lead > > > to their allocation. This can be seen in the callgraph gdb-paths-that-caused-leak.png > > > which confirms Valgrind's report. The difference between this callgraph and the previous one is > > > that this one only tracks the objects that have not been deallocated. > > > > > > The patch that is attached to this e-mail contains a fix for this. It works by making TskFsDir a friend > > > class of TskFsFile in order for TskFsDir::getFile to be able to set the TskFsFile::m_opened attribute > > > to true, since if that attribute is not set to true, TskFsFile::~TskFsFile will not call > > > tsk_fs_file_close which will cause the leak to occur. > > > > > > I will also make a pull-request on github for this. > > > > > > I look forward to your opinion on this patch. > > > > > > Best regards, > > > Stefan > > > > > > P.S. > > > > > > As a sidenote, Valgrind also reported a leak found in tsk/base/tsk_error.c , originating from a call to > > > tsk_error_get_info. I've noticed that there is a destructor handler free_error_info which deallocates > > > the memory allocated by tsk_error_get_info, but that destructor is only called > > > upon pthread_exit. The valgrind leak report for this can be found in valgrind-leak-tsk-error.txt > > > To fix this, isolated.cpp also calls pthread_exit which solves that problem. > diff --git a/tsk/fs/tsk_fs.h b/tsk/fs/tsk_fs.h > index 39e0608..db4ef85 100644 > --- a/tsk/fs/tsk_fs.h > +++ b/tsk/fs/tsk_fs.h > @@ -2655,6 +2655,7 @@ class TskFsMeta { > * undefined. See TSK_FS_FILE for more details. > */ > class TskFsFile { > + friend class TskFsDir; > private: > TSK_FS_FILE * m_fsFile; > bool m_opened; > @@ -2972,9 +2973,11 @@ class TskFsDir { > */ > TskFsFile *getFile(size_t a_idx) const { > TSK_FS_FILE *fs_file = tsk_fs_dir_get(m_fsDir, a_idx); > - if (fs_file != NULL) > - return new TskFsFile(fs_file); > - else > + if (fs_file != NULL) { > + TskFsFile *f = new TskFsFile(fs_file); > + f->m_opened = true; > + return f; > + } else > return NULL; > }; > |
From: Stefan P. <ste...@gm...> - 2014-05-06 01:04:13
|
Patch attached On Tue, May 06, 2014 at 04:01:42AM +0300, Stefan Petrea wrote: > Sending a new patch. Forgot to add a line in the previous one. > I've opened a github pull request for this here https://github.com/sleuthkit/sleuthkit/pull/329 > > On Tue, May 06, 2014 at 03:41:09AM +0300, Stefan Petrea wrote: > > Hi, > > > > I'm using Sleuthkit and encountered a memory leak in TskFsDir::getFile. On a 20GB disk image, > > with ~10k files this leads to unallocated memory of ~2.5GB according to Valgrind. > > > > I'm writing to the mailing list to provide details on the leak and a patch for it. > > > > I wrote a small testcase that reproduces the leak in isolated.cpp > > > > The output of Valgrind before the patch was applied( commit a9e2aa0e39cd8eeffd4cccf951b7b91a40f5e8c0 ), > > can be found in valgrind-output-before-patch.txt and the output of Valgrind after the patch > > was applied can be found in valgrind-output-after-patch.txt > > > > The leaks reported by Valgrind all originate in TskFsDir::getFile. The unfreed objects > > were all of type TskFsFile. > > > > The script find-leak.py automates GDB 7.6 and is designed to automatically debug isolated.cpp and > > at the same time collect information about the callgraph and pointers that cause the leak > > (Note: GDB 7.7 has a different Python API). > > > > find-leak.py keeps track of all the pointers allocated through tsk_fs_file_alloc and all > > the pointers deallocated through tsk_fs_file_close which can be seen in the callgraph > > gdb-all-paths.png > > > > The aim of this script (find-leak.py ) is to track the unfreed objects and find the call paths that lead > > to their allocation. This can be seen in the callgraph gdb-paths-that-caused-leak.png > > which confirms Valgrind's report. The difference between this callgraph and the previous one is > > that this one only tracks the objects that have not been deallocated. > > > > The patch that is attached to this e-mail contains a fix for this. It works by making TskFsDir a friend > > class of TskFsFile in order for TskFsDir::getFile to be able to set the TskFsFile::m_opened attribute > > to true, since if that attribute is not set to true, TskFsFile::~TskFsFile will not call > > tsk_fs_file_close which will cause the leak to occur. > > > > I will also make a pull-request on github for this. > > > > I look forward to your opinion on this patch. > > > > Best regards, > > Stefan > > > > P.S. > > > > As a sidenote, Valgrind also reported a leak found in tsk/base/tsk_error.c , originating from a call to > > tsk_error_get_info. I've noticed that there is a destructor handler free_error_info which deallocates > > the memory allocated by tsk_error_get_info, but that destructor is only called > > upon pthread_exit. The valgrind leak report for this can be found in valgrind-leak-tsk-error.txt > > To fix this, isolated.cpp also calls pthread_exit which solves that problem. |
From: Stefan P. <ste...@gm...> - 2014-05-06 01:01:47
|
Sending a new patch. Forgot to add a line in the previous one. I've opened a github pull request for this here https://github.com/sleuthkit/sleuthkit/pull/329 On Tue, May 06, 2014 at 03:41:09AM +0300, Stefan Petrea wrote: > Hi, > > I'm using Sleuthkit and encountered a memory leak in TskFsDir::getFile. On a 20GB disk image, > with ~10k files this leads to unallocated memory of ~2.5GB according to Valgrind. > > I'm writing to the mailing list to provide details on the leak and a patch for it. > > I wrote a small testcase that reproduces the leak in isolated.cpp > > The output of Valgrind before the patch was applied( commit a9e2aa0e39cd8eeffd4cccf951b7b91a40f5e8c0 ), > can be found in valgrind-output-before-patch.txt and the output of Valgrind after the patch > was applied can be found in valgrind-output-after-patch.txt > > The leaks reported by Valgrind all originate in TskFsDir::getFile. The unfreed objects > were all of type TskFsFile. > > The script find-leak.py automates GDB 7.6 and is designed to automatically debug isolated.cpp and > at the same time collect information about the callgraph and pointers that cause the leak > (Note: GDB 7.7 has a different Python API). > > find-leak.py keeps track of all the pointers allocated through tsk_fs_file_alloc and all > the pointers deallocated through tsk_fs_file_close which can be seen in the callgraph > gdb-all-paths.png > > The aim of this script (find-leak.py ) is to track the unfreed objects and find the call paths that lead > to their allocation. This can be seen in the callgraph gdb-paths-that-caused-leak.png > which confirms Valgrind's report. The difference between this callgraph and the previous one is > that this one only tracks the objects that have not been deallocated. > > The patch that is attached to this e-mail contains a fix for this. It works by making TskFsDir a friend > class of TskFsFile in order for TskFsDir::getFile to be able to set the TskFsFile::m_opened attribute > to true, since if that attribute is not set to true, TskFsFile::~TskFsFile will not call > tsk_fs_file_close which will cause the leak to occur. > > I will also make a pull-request on github for this. > > I look forward to your opinion on this patch. > > Best regards, > Stefan > > P.S. > > As a sidenote, Valgrind also reported a leak found in tsk/base/tsk_error.c , originating from a call to > tsk_error_get_info. I've noticed that there is a destructor handler free_error_info which deallocates > the memory allocated by tsk_error_get_info, but that destructor is only called > upon pthread_exit. The valgrind leak report for this can be found in valgrind-leak-tsk-error.txt > To fix this, isolated.cpp also calls pthread_exit which solves that problem. |
From: Luís F. N. <lfc...@gm...> - 2014-04-28 23:38:24
|
Updating, I did not build nor test the develop branch, but the configuration file mismatch_config.xml from the FileExtMismatch module seems like Autopsy is not being able to differentiate between the MS Office formats. If this is correct, I think using Tika detection from an inputStream would solve the issue. 2014-04-28 20:18 GMT-03:00 Luís Filipe Nassif <lfc...@gm...>: > Great news, Brian, thank you. > > I took a look at TikaFileTypeDetector and it is using only the file first > 100 bytes for detection. From Tika.detect(byte[]) doc: > > "For best results at least a few kilobytes of the document data are > needed. See also the other detect() methods for better alternatives when > you have more than just the document prefix available for type detection." > > And Tika's default, when reading from a stream, currently is 64KB, so it > can correctly detect things like "XML root elements after initial comment > and DTDs" (MimeTypes doc) and, IMHO, zip based types (ooxml, odf...), ole2 > and the text detection heuristcs would work better. > > From my Tika experience, I think it would do better detection using > Tika.detec(inputStream, fileName), so Tika will read file bytes as needed > and will use the file name for detection refinement. In some cases Tika > will spool the entire stream to a temporary file for correct detection, but > in the general case will read 64KB. I think reading only 100B, instead of > 64KB, do not have significant time difference when reading from a spinning > magnetic drive, with high latency times, commonlly used for disk images > storage. > > > 2014-04-28 11:01 GMT-03:00 Brian Carrier <ca...@sl...>: > >> Yea, the 3.1 release (which is the develop branch on github) is using >> Tika's file type detection. >> >> >> >> On Apr 26, 2014, at 7:57 AM, Luís Filipe Nassif <lfc...@gm...> >> wrote: >> >> > Hi all, >> > >> > As I previously mentioned, I did not see a module like this in Autopsy >> 3, but read somewhere it will be in Autopsy 3.1, right? Solr, under the >> hoods, uses Tika for this purpose (and the results are great) before >> extracting text from files to index. I think explicitly using Tika for >> detection would be good, so Autopsy could inform Solr about the detected >> file mime type instead of Solr re-detecting all file signatures again. What >> do you think about it? >> > >> > Nassif >> > >> ------------------------------------------------------------------------------ >> > Start Your Social Network Today - Download eXo Platform >> > Build your Enterprise Intranet with eXo Platform Software >> > Java Based Open Source Intranet - Social, Extensible, Cloud Ready >> > Get Started Now And Turn Your Intranet Into A Collaboration Platform >> > >> http://p.sf.net/sfu/ExoPlatform_______________________________________________ >> > sleuthkit-developers mailing list >> > sle...@li... >> > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers >> >> > |
From: Luís F. N. <lfc...@gm...> - 2014-04-28 23:18:36
|
Great news, Brian, thank you. I took a look at TikaFileTypeDetector and it is using only the file first 100 bytes for detection. From Tika.detect(byte[]) doc: "For best results at least a few kilobytes of the document data are needed. See also the other detect() methods for better alternatives when you have more than just the document prefix available for type detection." And Tika's default, when reading from a stream, currently is 64KB, so it can correctly detect things like "XML root elements after initial comment and DTDs" (MimeTypes doc) and, IMHO, zip based types (ooxml, odf...), ole2 and the text detection heuristcs would work better. >From my Tika experience, I think it would do better detection using Tika.detec(inputStream, fileName), so Tika will read file bytes as needed and will use the file name for detection refinement. In some cases Tika will spool the entire stream to a temporary file for correct detection, but in the general case will read 64KB. I think reading only 100B, instead of 64KB, do not have significant time difference when reading from a spinning magnetic drive, with high latency times, commonlly used for disk images storage. 2014-04-28 11:01 GMT-03:00 Brian Carrier <ca...@sl...>: > Yea, the 3.1 release (which is the develop branch on github) is using > Tika's file type detection. > > > On Apr 26, 2014, at 7:57 AM, Luís Filipe Nassif <lfc...@gm...> > wrote: > > > Hi all, > > > > As I previously mentioned, I did not see a module like this in Autopsy > 3, but read somewhere it will be in Autopsy 3.1, right? Solr, under the > hoods, uses Tika for this purpose (and the results are great) before > extracting text from files to index. I think explicitly using Tika for > detection would be good, so Autopsy could inform Solr about the detected > file mime type instead of Solr re-detecting all file signatures again. What > do you think about it? > > > > Nassif > > > ------------------------------------------------------------------------------ > > Start Your Social Network Today - Download eXo Platform > > Build your Enterprise Intranet with eXo Platform Software > > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > > Get Started Now And Turn Your Intranet Into A Collaboration Platform > > > http://p.sf.net/sfu/ExoPlatform_______________________________________________ > > sleuthkit-developers mailing list > > sle...@li... > > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers > > |
From: Brian C. <ca...@sl...> - 2014-04-28 14:01:41
|
Yea, the 3.1 release (which is the develop branch on github) is using Tika's file type detection. On Apr 26, 2014, at 7:57 AM, Luís Filipe Nassif <lfc...@gm...> wrote: > Hi all, > > As I previously mentioned, I did not see a module like this in Autopsy 3, but read somewhere it will be in Autopsy 3.1, right? Solr, under the hoods, uses Tika for this purpose (and the results are great) before extracting text from files to index. I think explicitly using Tika for detection would be good, so Autopsy could inform Solr about the detected file mime type instead of Solr re-detecting all file signatures again. What do you think about it? > > Nassif > ------------------------------------------------------------------------------ > Start Your Social Network Today - Download eXo Platform > Build your Enterprise Intranet with eXo Platform Software > Java Based Open Source Intranet - Social, Extensible, Cloud Ready > Get Started Now And Turn Your Intranet Into A Collaboration Platform > http://p.sf.net/sfu/ExoPlatform_______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Luís F. N. <lfc...@gm...> - 2014-04-26 11:57:25
|
Hi all, As I previously mentioned, I did not see a module like this in Autopsy 3, but read somewhere it will be in Autopsy 3.1, right? Solr, under the hoods, uses Tika for this purpose (and the results are great) before extracting text from files to index. I think explicitly using Tika for detection would be good, so Autopsy could inform Solr about the detected file mime type instead of Solr re-detecting all file signatures again. What do you think about it? Nassif |
From: Luís F. N. <lfc...@gm...> - 2014-04-26 01:34:51
|
Hi guys, I think Autopsy 3 is a very promising forensic framework, and it will become a lot better. Looking at the developers guidelines, apis and source, I think I could contribute with one or some modules. I have been working for the last three years on a java analisys tool and I think that I could adapt some of its modules to autopsy modules. 1. A PST file parser for extracting emails and attachs, powered by java-libpst, Apache licensed 2. A DBX file parser for extracting emails and attachs, powered by a patched version of java OEReader, GPL 3. HTML viewer, by using javaFx (i read somewhere it is being implemented?) 4. PDF viewer, by IcePDF, Apache License (I have already coded a proof of concept PDFContentViewer autopsy module) 5. EML viewer, using Apache Mime4J and JavaFx 6. TIF viewer, using java imageio 7. Office and many other formats viewer, integrating LibreOffice4 Which one of these is not being developed and could better improve Autopsy functionalities? PS: I did not see a file signature ingest module. Does it already exist? Nassif Brazilian Federal Police Examiner |
From: Hu, H. - 0. - M. <Hon...@ll...> - 2014-04-04 18:24:35
|
Sorry, forgot to add the attachments. -- Hongyi Hu MIT Lincoln Laboratory Group 59 (Cyber System Assessments) Ph: (781) 981-8224 From: <Hu>, Hongyi Hu <Hon...@ll...> Date: Friday, April 4, 2014 2:22 PM To: Alex Nelson <ajn...@cs...> Cc: "sle...@li..." <sle...@li...> Subject: Re: [sleuthkit-developers] NTFS data run collisions Hi Alex, Thanks for response. I wasn't able to come back to this issue until this week I found a bunch of bugs in analyzeMFT that was throwing off the calculations. It looks like the overlaps were due to my misunderstanding of how sparse and compressed data runs work in NTFS, so at least for TSK it looks like there aren't collisions between different MFT entry numbers. A follow-up question about data runs that is highly perplexing. I've attached an odd example of a raw MFT entry (of a zip file) from my clean disk image. I also included the hex dump which includes my math and notes. I'm perplexed as to how TSK is parsing the data runs. The data run snippet is : 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 00(End) But TSK is interpreting the data runs as 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) TSK seems to be right, but I don't understand what it's doing. My analysis by hand (which is the same as what analyzeMFT gives me and consistent with all the NTFS documentation I could find) gives me the following runs. The first three are normal I get the same result as TSK. The last few are divergent. 31 01 4c 6c 05 (normal) len 0x01 offset 0x056c4c ==355404Cluster Address == 355404 21 03 71 01 (normal) len 0x03 offset 0x0171 == 369Cluster Address == 355404 + 369 == 355773 31 16 be 31 fd (normal) len 0x16 (22) offset 0xfd31be == -183874Cluster Address == 171899 Here's where I'm confused: 03 00 94 15 (sparse) The header gives me a 0 byte offset field and a 3 byte length field. 0 byte offset field means a sparse data run (so these runs don't take up disk space and return 0s when read) 3 byte length field gives me a length of 0x159400 == 1414144 01 31 (sparse) 0 byte offset field 1 byte length field == length 0x31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 Something is clearly wrong here. TSK gives me something more reasonable: [Len: 1, Addr: 355404], [Len: 3, Addr: 355773], [Len: 22, Addr: 171899], [Len: 39, Addr: 242959], [Len: 111, Addr: 209321], [Len: 39, Addr: 1109421], [Len: 79, Addr: 1192478], The first three runs are the same, but the rest are different. TSK seems to interpret the runs like this: 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) This only makes sense to me if the fourth line were 31 27 94 15 01 instead of 03 00 94 15 01. Then TSK's numbers and parsing check out with the raw run list. I believe that TSK is correct, but I don't understand how it is parsing the data runs here. Any ideas? Thanks! -- Hongyi Hu MIT Lincoln Laboratory Group 59 (Cyber System Assessments) Ph: (781) 981-8224 From: Alex Nelson <ajn...@cs...> Date: Wednesday, March 26, 2014 10:52 AM To: Hongyi Hu <Hon...@ll...> Cc: "sle...@li..." <sle...@li...> Subject: Re: [sleuthkit-developers] NTFS data run collisions Hi Hongyi, For clarification, these are allocated files you're asking about, right? If some of the files are deleted, the answer is pretty straightforward. Also, are you asking about partial or total overlaps? You should be building your hash table based on MFT entry numbers, not on file names. NTFS allows multiple hard links. Do you have example files you could reference in one of the publicly available disk images? (One of the M57's will likely give you an example.) http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images --Alex On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hon...@ll...> wrote: > Hi, > > I'm an NTFS rookie with a question about data runs. Are there any normal > reasons why two different files might have overlapping data runs, i.e. mapped > to some of the same clusters/blocks on the disk? > > For a research project, I would like to do the following: given a sector on > the disk, determine what file (if any) owns the data in that sector. The > first thing I tried was to build a simple block to filename hash table. For > each file, I look at its data runs and put them into the table. With both TSK > and the analyzeMFT library and using a clean Windows XP disk image, I get a > non-trivial number of block collisions. > > Is this normal behavior? I would have thought that the block assignments > would be unique. I have not been successful finding any info about this in > various documentation. > > > Thanks! > > -- > Hongyi Hu > > MIT Lincoln Laboratory > Group 59 (Cyber System Assessments) > Ph: (781) 981-8224 > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech_____________________________________________ > __ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Hu, H. - 0. - M. <Hon...@ll...> - 2014-04-04 18:22:48
|
Hi Alex, Thanks for response. I wasn't able to come back to this issue until this week I found a bunch of bugs in analyzeMFT that was throwing off the calculations. It looks like the overlaps were due to my misunderstanding of how sparse and compressed data runs work in NTFS, so at least for TSK it looks like there aren't collisions between different MFT entry numbers. A follow-up question about data runs that is highly perplexing. I've attached an odd example of a raw MFT entry (of a zip file) from my clean disk image. I also included the hex dump which includes my math and notes. I'm perplexed as to how TSK is parsing the data runs. The data run snippet is : 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 00(End) But TSK is interpreting the data runs as 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) TSK seems to be right, but I don't understand what it's doing. My analysis by hand (which is the same as what analyzeMFT gives me and consistent with all the NTFS documentation I could find) gives me the following runs. The first three are normal I get the same result as TSK. The last few are divergent. 31 01 4c 6c 05 (normal) len 0x01 offset 0x056c4c ==355404 Cluster Address == 355404 21 03 71 01 (normal) len 0x03 offset 0x0171 == 369 Cluster Address == 355404 + 369 == 355773 31 16 be 31 fd (normal) len 0x16 (22) offset 0xfd31be == -183874 Cluster Address == 171899 Here's where I'm confused: 03 00 94 15 (sparse) The header gives me a 0 byte offset field and a 3 byte length field. 0 byte offset field means a sparse data run (so these runs don't take up disk space and return 0s when read) 3 byte length field gives me a length of 0x159400 == 1414144 01 31 (sparse) 0 byte offset field 1 byte length field == length 0x31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 Something is clearly wrong here. TSK gives me something more reasonable: [Len: 1, Addr: 355404], [Len: 3, Addr: 355773], [Len: 22, Addr: 171899], [Len: 39, Addr: 242959], [Len: 111, Addr: 209321], [Len: 39, Addr: 1109421], [Len: 79, Addr: 1192478], The first three runs are the same, but the rest are different. TSK seems to interpret the runs like this: 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) This only makes sense to me if the fourth line were 31 27 94 15 01 instead of 03 00 94 15 01. Then TSK's numbers and parsing check out with the raw run list. I believe that TSK is correct, but I don't understand how it is parsing the data runs here. Any ideas? Thanks! -- Hongyi Hu MIT Lincoln Laboratory Group 59 (Cyber System Assessments) Ph: (781) 981-8224 From: Alex Nelson <ajn...@cs...> Date: Wednesday, March 26, 2014 10:52 AM To: Hongyi Hu <Hon...@ll...> Cc: "sle...@li..." <sle...@li...> Subject: Re: [sleuthkit-developers] NTFS data run collisions Hi Hongyi, For clarification, these are allocated files you're asking about, right? If some of the files are deleted, the answer is pretty straightforward. Also, are you asking about partial or total overlaps? You should be building your hash table based on MFT entry numbers, not on file names. NTFS allows multiple hard links. Do you have example files you could reference in one of the publicly available disk images? (One of the M57's will likely give you an example.) http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images --Alex On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hon...@ll...> wrote: > Hi, > > I'm an NTFS rookie with a question about data runs. Are there any normal > reasons why two different files might have overlapping data runs, i.e. mapped > to some of the same clusters/blocks on the disk? > > For a research project, I would like to do the following: given a sector on > the disk, determine what file (if any) owns the data in that sector. The > first thing I tried was to build a simple block to filename hash table. For > each file, I look at its data runs and put them into the table. With both TSK > and the analyzeMFT library and using a clean Windows XP disk image, I get a > non-trivial number of block collisions. > > Is this normal behavior? I would have thought that the block assignments > would be unique. I have not been successful finding any info about this in > various documentation. > > > Thanks! > > -- > Hongyi Hu > > MIT Lincoln Laboratory > Group 59 (Cyber System Assessments) > Ph: (781) 981-8224 > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech_____________________________________________ > __ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Alex N. <ajn...@cs...> - 2014-03-26 14:52:16
|
Hi Hongyi, For clarification, these are allocated files you're asking about, right? If some of the files are deleted, the answer is pretty straightforward. Also, are you asking about partial or total overlaps? You should be building your hash table based on MFT entry numbers, not on file names. NTFS allows multiple hard links. Do you have example files you could reference in one of the publicly available disk images? (One of the M57's will likely give you an example.) http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images --Alex On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hon...@ll...> wrote: > Hi, > > I'm an NTFS rookie with a question about data runs. Are there any normal reasons why two different files might have overlapping data runs, i.e. mapped to some of the same clusters/blocks on the disk? > > For a research project, I would like to do the following: given a sector on the disk, determine what file (if any) owns the data in that sector. The first thing I tried was to build a simple block to filename hash table. For each file, I look at its data runs and put them into the table. With both TSK and the analyzeMFT library and using a clean Windows XP disk image, I get a non-trivial number of block collisions. > > Is this normal behavior? I would have thought that the block assignments would be unique. I have not been successful finding any info about this in various documentation. > > > Thanks! > > -- > Hongyi Hu > > MIT Lincoln Laboratory > Group 59 (Cyber System Assessments) > Ph: (781) 981-8224 > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech_______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Hu, H. - 0. - M. <Hon...@ll...> - 2014-03-25 21:01:00
|
Hi, I'm an NTFS rookie with a question about data runs. Are there any normal reasons why two different files might have overlapping data runs, i.e. mapped to some of the same clusters/blocks on the disk? For a research project, I would like to do the following: given a sector on the disk, determine what file (if any) owns the data in that sector. The first thing I tried was to build a simple block to filename hash table. For each file, I look at its data runs and put them into the table. With both TSK and the analyzeMFT library and using a clean Windows XP disk image, I get a non-trivial number of block collisions. Is this normal behavior? I would have thought that the block assignments would be unique. I have not been successful finding any info about this in various documentation. Thanks! -- Hongyi Hu MIT Lincoln Laboratory Group 59 (Cyber System Assessments) Ph: (781) 981-8224 |
From: Brian C. <ca...@sl...> - 2014-03-09 04:06:51
|
You probably haven't noticed, but we've been using a branching model on git that is based on gitflow. There are some pull requests that I've manually made to the 'develop' branch instead of 'master'. I've added some basic info to the wiki, but I'd like to have a lot more details and a "cook book" for the git commands to use. http://wiki.sleuthkit.org/index.php?title=Developer_Guidelines#Source_Code_Branches |
From: Mari D. <mar...@gm...> - 2014-01-12 20:05:41
|
I think I found what I was looking for: http://www.sleuthkit.org/autopsy/docs/api-docs/index.html On Sun, Jan 12, 2014 at 11:26 AM, Mari DeGrazia <mar...@gm...>wrote: > Hey guys. I am looking at writing some modules for Autopsy. I am looking > for some information to get started. IE - How the the modules work, an > example etc. I thought I saw a blog post about it somewhere, but I can't > seem to locate it. > > Any links would be appreciated! > > Thanks, > > Mari > |
From: Mari D. <mar...@gm...> - 2014-01-12 18:26:19
|
Hey guys. I am looking at writing some modules for Autopsy. I am looking for some information to get started. IE - How the the modules work, an example etc. I thought I saw a blog post about it somewhere, but I can't seem to locate it. Any links would be appreciated! Thanks, Mari |
From: Georger A. <geo...@ya...> - 2013-12-10 04:41:44
|
Hi, I've built TSK 4.1.2 with Visual Studio 2013. I ran ant to build the Java bindings, and it copied the DLLs - libewf.dll, libtsk_jni.dll, and zlib.dll - into dist\Tsk_DataModel.jar. Just for the record, I had to manually create the crt and crt\win32 folders under bindings\java, otherwise ant would complain and fail to build dist\Tsk_DataModel.jar. Then I built the Autopsy 3.0.8 sources. I ran `ant build` and all was fine, but when I ran `ant run` I got the following messages at the console: [exec] SleuthkitJNI: failed to load msvcp100 [exec] SleuthkitJNI: failed to load msvcr100 [exec] SleuthkitJNI: loaded zlib [exec] SleuthkitJNI: loaded libewf [exec] SleuthkitJNI: loaded libtsk_jni I looked inside dist\Tsk_DataModel.jar and noticed that NATIVELIBS\i386\win\ didn't contain msvcp100.dll nor msvcp100.dll, just the 3 DLLs that were built when I compiled TSK 4.1.2. I then looked at bindings\java\src\org\sleuthkit\datamodel\LibraryUtils.java, and in lines 148 and 149 the function tries to explicitly load msvcr100.dll and msvcp100.dll. My question is, is this a problem? Shouldn't the build script and the source code refer to the runtime DLLs that were used to compile the Java binding - those would be msvcr120.dll and msvcp120.dll from Visual Studio 2013? In case it helps, these are the software versions I'm running: Windows 8.1 Pro x64 Visual Studio 2013 Express 32-bit JDK 7u25 32-bit ant 1.9.2 TSK 4.1.2 Libewf 20130416 zlib 1.2.8 Autopsy 3.0.8 Regards, Georger |
From: Mari T. <mar...@gm...> - 2013-12-04 04:12:12
|
For recovering deleted data, a good place to start might be sqlite databases within the file system. Androids stores information such as SMS and Email within these databases. You can recover deleted entries by parsing the unallocated space within the sqlite database itself. I've written a Python script that does this that you might be able to incorporate or use the algorithms from. On Mon, Dec 2, 2013 at 1:32 PM, Wiktor Sypniewski < wik...@gm...> wrote: > Hi Guys, > > My name is Victor and I'm final year student in Dublin Institute of > Technology. I would like to develop module for Autopsy as my final year > project. > > I'm interested in mobile forensics. I would like to develop a module that > would be able to browse/recover data from a mobile device running on > Android. > > I'm not sure what kind of data I should be interested in? I think that > Autopsy 3 can scan mobile device as if it was just a disc but is there > anything else I should be looking for? > > Any help in getting me started with this will be greatly appreciated! > > Kindest Regards > Vic > > > > > Wiktor Sypniewski > +353862177331 > > www.bluegreenblack.com > > > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics > Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk > _______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers > > |
From: Brian C. <ca...@sl...> - 2013-12-04 03:49:18
|
Autopsy can ingest most Android images, but not all. Our experience has been that there are three types of data that you can get from a physical acquisition of an Android device depending on the acquisition method used: 1) A single image that has a partition table in the beginning and partitions with Ext4 or YAFFS2 file systems in them. Autopsy supports this. 2) A bunch of images of each partition. Autopsy supports this too, you just need to add each image. 3) A single image that doesn't have a partition table in the beginning and the offsets to each partition are hard coded somewhere else in a proprietary way. Autopsy doesn't support this unless you determine the partition boundaries and make logical images and then it is the same as #2 above. Once you have the file systems imported, you can make modules to collect the standard contact info (there are already blackboard artifacts and attributes to store this type of data) or focus on third-party apps. brian On Dec 2, 2013, at 3:32 PM, Wiktor Sypniewski <wik...@gm...> wrote: > Hi Guys, > > My name is Victor and I'm final year student in Dublin Institute of Technology. I would like to develop module for Autopsy as my final year project. > > I'm interested in mobile forensics. I would like to develop a module that would be able to browse/recover data from a mobile device running on Android. > > I'm not sure what kind of data I should be interested in? I think that Autopsy 3 can scan mobile device as if it was just a disc but is there anything else I should be looking for? > > Any help in getting me started with this will be greatly appreciated! > > Kindest Regards > Vic > > > > > Wiktor Sypniewski > +353862177331 > > www.bluegreenblack.com > ------------------------------------------------------------------------------ > Rapidly troubleshoot problems before they affect your business. Most IT > organizations don't have a clear picture of how application performance > affects their revenue. With AppDynamics, you get 100% visibility into your > Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! > http://pubads.g.doubleclick.net/gampad/clk?id=84349351&iu=/4140/ostg.clktrk_______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Wiktor S. <wik...@gm...> - 2013-12-02 20:32:46
|
Hi Guys, My name is Victor and I'm final year student in Dublin Institute of Technology. I would like to develop module for Autopsy as my final year project. I'm interested in mobile forensics. I would like to develop a module that would be able to browse/recover data from a mobile device running on Android. I'm not sure what kind of data I should be interested in? I think that Autopsy 3 can scan mobile device as if it was just a disc but is there anything else I should be looking for? Any help in getting me started with this will be greatly appreciated! Kindest Regards Vic Wiktor Sypniewski +353862177331 www.bluegreenblack.com |
From: Stuart M. <st...@ap...> - 2013-10-11 23:00:59
|
I am building against TSK 4.1 on Linux. If I call tsk_vs_part_read with the 4th parameter, the 'a_len' larger than the partition's size in bytes, the call hangs/spins. Of course it is easy to compute the maximum a_len that SHOULD be passed, thus: a_len = a_vs_part->len * a_vs_part->vs->block_size - a_off but I just wanted to point out what happens if that check is not done by the caller. Regards Stuart |
From: Brian C. <ca...@sl...> - 2013-09-30 17:18:06
|
We're trying to plan accordingly for OSDFCon and need to know how much time to allocate to reviewing module submissions. If you are working on a module for the competition (http://www.basistechweek.com/osdf.html#contest), can you shoot me an e-mail off list? I want to get a rough count. thanks, brian |
From: Brian C. <ca...@sl...> - 2013-09-17 02:00:18
|
Sure. Before you get to a TSK_FS_META, you must have had a TSK_FS_INFO. It contains all the info you need. byte_offset_relative_to_disk = fs_info->offset + (OFFSET_FROM_META * fs_info->block_size) On Aug 28, 2013, at 4:11 PM, Robert James <sro...@gm...> wrote: >> From a TSK_FS_META, I know how to find the offset of every run making > up the file. But those offsets are relative to the start of the > _filesystem_, and in units of filesystem blocks. > > I'd like to turn those into offsets relative to the start of the > _disk_, factoring in any filesystem or volume system. And I'd like > them in units of bytes (or device sectors). > > That is, given a TSK_FS_META, I'd like to find the byte offset in the > disk of the runs. Something I could feed right into dd. Can I do this? > How? > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |
From: Alex N. <ajn...@cs...> - 2013-09-17 00:49:00
|
Oh, it looks like you never got an answer to this. The straightforward way is to have Fiwalk generate its XML output. The byte_run elements' img_offset attribute is exactly what you were looking for. This will serve you just fine if you were asking about file content addresses. If you were asking about non-content addresses (like MFT entries), though, that still requires manual calculation. I've proposed a way to record those addresses in Fiwalk's output ( https://github.com/dfxml-working-group/dfxml_schema/issues/5 ), but after that is discussed, there will be a not-terribly-straightforward-looking batch of code to write to actually implement it. I hope that helps. --Alex On Aug 28, 2013, at 16:11 , Robert James <sro...@gm...> wrote: >> From a TSK_FS_META, I know how to find the offset of every run making > up the file. But those offsets are relative to the start of the > _filesystem_, and in units of filesystem blocks. > > I'd like to turn those into offsets relative to the start of the > _disk_, factoring in any filesystem or volume system. And I'd like > them in units of bytes (or device sectors). > > That is, given a TSK_FS_META, I'd like to find the byte offset in the > disk of the runs. Something I could feed right into dd. Can I do this? > How? > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |