Re: [sleuthkit-developers] NTFS data run collisions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Alex,

Thanks for response.  I wasn't able to come back to this issue until this
week  I found a bunch of bugs in analyzeMFT that was throwing off the
calculations.

It looks like the overlaps were due to my misunderstanding of how sparse and
compressed data runs work in NTFS, so at least for TSK it looks like there
aren't collisions between different MFT entry numbers.

A follow-up question about data runs that is highly perplexing.  I've
attached an odd example of a raw MFT entry (of a zip file) from my clean
disk image.  I also included the hex dump which includes my math and notes.
I'm perplexed as to how TSK is parsing the data runs.

The data run snippet is :

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 
01 31 
6f 9a 7c ff 31 27 04 bc 0d 31 4f 71 44 01 00  f5 80 00 00 00 00 80 00
00(End)

But TSK is interpreting the data runs as

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 01 
31 6f 9a 7c ff 
31 27 04 bc 0d 
31 4f 71 44 01 
00 (End)

TSK seems to be right, but I don't understand what it's doing.

My analysis by hand (which is the same as what analyzeMFT gives me and
consistent with all the NTFS documentation I could find) gives me the
following runs.  The first three are normal  I get the same result as TSK.
The last few are divergent.

31 01 4c 6c 05 (normal)
len 0x01    offset 0x056c4c ==355404 Cluster Address == 355404

21 03 71 01 (normal)
len 0x03    offset 0x0171 == 369 Cluster Address == 355404  + 369 == 355773

31 16 be 31 fd (normal)
len 0x16 (22)    offset 0xfd31be == -183874 Cluster Address == 171899

Here's where I'm confused:

03 00 94 15 (sparse)
The header gives me a 0 byte offset field and a 3 byte length field.
0 byte offset field means a sparse data run (so these runs don't take up
disk space and return 0s when read)
3 byte length field gives me a length of 0x159400 == 1414144

01 31 (sparse)
0 byte offset field
1 byte length field == length 0x31

6f 9a 7c ff 31 27 04  bc 0d 31 4f 71 44 01 00 f5 80 00 00 00 00 80 00
Something is clearly wrong here.

TSK gives me something more reasonable:

[Len: 1, Addr: 355404],
[Len: 3, Addr: 355773],
[Len: 22, Addr: 171899],
[Len: 39, Addr: 242959],
[Len: 111, Addr: 209321],
[Len: 39, Addr: 1109421],
[Len: 79, Addr: 1192478],

The first three runs are the same, but the rest are different.  TSK seems to
interpret the runs like this:

31 01 4c 6c 05
21 03 71 01
31 16 be 31 fd 
03 00 94 15 01 
31 6f 9a 7c ff 
31 27 04 bc 0d 
31 4f 71 44 01 
00 (End)

This only makes sense to me if the fourth line were 31 27 94 15 01 instead
of 03 00 94 15 01.  Then TSK's numbers and parsing check out with the raw
run list.  I believe that TSK is correct, but I don't understand how it is
parsing the data runs here.

Any ideas?

Thanks!

-- 
Hongyi Hu

MIT Lincoln Laboratory
Group 59 (Cyber System Assessments)
Ph: (781) 981-8224

From:  Alex Nelson <ajn...@cs...>
Date:  Wednesday, March 26, 2014 10:52 AM
To:  Hongyi Hu <Hon...@ll...>
Cc:  "sle...@li..."
<sle...@li...>
Subject:  Re: [sleuthkit-developers] NTFS data run collisions

Hi Hongyi,

For clarification, these are allocated files you're asking about, right?  If
some of the files are deleted, the answer is pretty straightforward.

Also, are you asking about partial or total overlaps?  You should be
building your hash table based on MFT entry numbers, not on file names.
NTFS allows multiple hard links.

Do you have example files you could reference in one of the publicly
available disk images?  (One of the M57's will likely give you an example.)
http://www.forensicswiki.org/wiki/Forensic_corpora#Disk_Images

--Alex

On Mar 25, 2014, at 14:00 , Hu, Hongyi - 0559 - MITLL <Hon...@ll...>
wrote:

> Hi,
> 
> I'm an NTFS rookie with a question about data runs.  Are there any normal
> reasons why two different files might have overlapping data runs, i.e. mapped
> to some of the same clusters/blocks on the disk?
> 
> For a research project, I would like to do the following: given a sector on
> the disk, determine what file (if any) owns the data in that sector.  The
> first thing I tried was to build a simple block to filename hash table.  For
> each file, I look at its data runs and put them into the table.  With both TSK
> and the analyzeMFT library and using a clean Windows XP disk image, I get a
> non-trivial number of block collisions.
> 
> Is this normal behavior?  I would have thought that the block assignments
> would be unique.  I have not been successful finding any info about this in
> various documentation.
> 
> 
> Thanks!
> 
> -- 
> Hongyi Hu
> 
> MIT Lincoln Laboratory
> Group 59 (Cyber System Assessments)
> Ph: (781) 981-8224
> ------------------------------------------------------------------------------
> Learn Graph Databases - Download FREE O'Reilly Book
> "Graph Databases" is the definitive new guide to graph databases and their
> applications. Written by three acclaimed leaders in the field,
> this first edition is now available. Download your free book today!
> http://p.sf.net/sfu/13534_NeoTech_____________________________________________
> __
> sleuthkit-developers mailing list
> sle...@li...
> https://lists.sourceforge.net/lists/listinfo/sleuthkit-developers