Re: [sleuthkit-developers] NTFS data run collisions
Brought to you by:
carrier
From: Hu, H. - 0. - M. <Hon...@ll...> - 2014-04-04 18:22:48
|
Hi Alex, Thanks for response. I wasn't able to come back to this issue until this week I found a bunch of bugs in analyzeMFT that was throwing off the calculations. It looks like the overlaps were due to my misunderstanding of how sparse and compressed data runs work in NTFS, so at least for TSK it looks like there aren't collisions between different MFT entry numbers. A follow-up question about data runs that is highly perplexing. I've attached an odd example of a raw MFT entry (of a zip file) from my clean disk image. I also included the hex dump which includes my math and notes. I'm perplexed as to how TSK is parsing the data runs. The data run snippet is : 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 00(End) But TSK is interpreting the data runs as 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) TSK seems to be right, but I don't understand what it's doing. My analysis by hand (which is the same as what analyzeMFT gives me and consistent with all the NTFS documentation I could find) gives me the following runs. The first three are normal I get the same result as TSK. The last few are divergent. 31 01 4c 6c 05 (normal) len 0x01 offset 0x056c4c ==355404 Cluster Address == 355404 21 03 71 01 (normal) len 0x03 offset 0x0171 == 369 Cluster Address == 355404 + 369 == 355773 31 16 be 31 fd (normal) len 0x16 (22) offset 0xfd31be == -183874 Cluster Address == 171899 Here's where I'm confused: 03 00 94 15 (sparse) The header gives me a 0 byte offset field and a 3 byte length field. 0 byte offset field means a sparse data run (so these runs don't take up disk space and return 0s when read) 3 byte length field gives me a length of 0x159400 == 1414144 01 31 (sparse) 0 byte offset field 1 byte length field == length 0x31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00 Something is clearly wrong here. TSK gives me something more reasonable: [Len: 1, Addr: 355404], [Len: 3, Addr: 355773], [Len: 22, Addr: 171899], [Len: 39, Addr: 242959], [Len: 111, Addr: 209321], [Len: 39, Addr: 1109421], [Len: 79, Addr: 1192478], The first three runs are the same, but the rest are different. TSK seems to interpret the runs like this: 31 01 4c 6c 05 21 03 71 01 31 16 be 31 fd 03 00 94 15 01 31 6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00 (End) This only makes sense to me if the fourth line were 31 27 94 15 01 instead of 03 00 94 15 01. Then TSK's numbers and parsing check out with the raw run list. I believe that TSK is correct, but I don't understand how it is parsing the data runs here. Any ideas? Thanks! -- Hongyi Hu MIT Lincoln Laboratory Group 59 (Cyber System Assessments) Ph: (781) 981-8224 From: Alex Nelson <ajn...@cs...> Date: Wednesday, March 26, 2014 10:52 AM To: Hongyi Hu <Hon...@ll...> Cc: "sle...@li..." <sle...@li...> Subject: Re: [sleuthkit-developers] NTFS data run collisions Hi Hongyi, For clarification, these are allocated files you're asking about, right? If some of the files are deleted, the answer is pretty straightforward. Also, are you asking about partial or total overlaps? You should be building your hash table based on MFT entry numbers, not on file names. NTFS allows multiple hard links. Do you have example files you could reference in one of the publicly available disk images? (One of the M57's will likely give you an example.) http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images --Alex On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hon...@ll...> wrote: > Hi, > > I'm an NTFS rookie with a question about data runs. Are there any normal > reasons why two different files might have overlapping data runs, i.e. mapped > to some of the same clusters/blocks on the disk? > > For a research project, I would like to do the following: given a sector on > the disk, determine what file (if any) owns the data in that sector. The > first thing I tried was to build a simple block to filename hash table. For > each file, I look at its data runs and put them into the table. With both TSK > and the analyzeMFT library and using a clean Windows XP disk image, I get a > non-trivial number of block collisions. > > Is this normal behavior? I would have thought that the block assignments > would be unique. I have not been successful finding any info about this in > various documentation. > > > Thanks! > > -- > Hongyi Hu > > MIT Lincoln Laboratory > Group 59 (Cyber System Assessments) > Ph: (781) 981-8224 > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech_____________________________________________ > __ > sleuthkit-developers mailing list > sle...@li... > https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers |