Menu

Create a domain for specific files

2017-12-12
2018-03-08
  • Guillaume Huet

    Guillaume Huet - 2017-12-12

    Hello,

    First of all thanks a lot for this tool, I'm only sad that I didn't find it earlier.

    I've got a failing NTFS HDD on which I'm trying to retrieve important files but it has a few problems :
    It's erroring on a few sectors but most of the ddrescue reads are OK (that's the good part)
    On the other hand it is extreeeeemelly slow to read the good sectors with an average rate under 6 kB/s
    It is 1TB large and I've calculated it would take the rescue about 5 years to complete, it's already been rescuing for 3 weeks so I think the estimate is realistic. More than the fact that I would prefer to get my data back without waiting for 5 years, I'm worried that the failing disk would deteriorate quickly in the next weeks and that the remains of the data could be lost forever.
    Most of the drive is filled with movies that are OK to be lost but I don't know where physically is the data of the important files that I need to recover and they can't be saved by classical file copy of the mounted drive

    I've discovered today ddrutility and was able to use ddru_ntfsbitmap to create a domain to reduce the recued area to only used space. In the mean time I've learned that I was spending a loooong time rescuing empty data so far (only 40% of the data I've recovered was actual used space), that's why I'm sad I didn't find it earlier.
    I've also discovered that I can use --mftdomain to rescue only the full MFT, that prooved usefull, I've used it and rescued the whole MFT without error before resuming the recovery with the used space domain.

    I've tried to use ddru_ntfsfindbad to check for errors in files with the partially rescued image. With the default behavior I only got a few files because most of the mapfile was logging "non tried" data. I then used the technique explained in https://sourceforge.net/p/ddrutility/discussion/general/thread/25bc970b/ to forge a mapfile where every "non tried" block is listed to "failed" block. This gave me a long list of files, including a large list of small files that I need to recover and a small list of large files that I don't need to recover.

    I was thinking about the algorithm by which the ddru_ntfsfindbad was listing the files corresponding to the sectors marked as bad and was thinking that maybe it would be possible to cross it with the behavior of ddru_ntfsbitmap to generate a domain corresponding to space used by a list of files in the MFT but not all.

    Since the project is open source I tried to look at the way both program work in order to check if I could create the program that I need by myself but got totally lost in the line where it's parsing the MFT...

    The output of ddru_ntfsfindbad list paths of files, from what I understood of NTFS partitions by linking from inodes to inodes in the MFT up to the root of the partition so I think it would be possible to select only files belonging to certain folders (in my case, all the movies that I want to exclude from the domain are in the same root folder).

    Could someone understanding better how the MFT is traversed in the ddru_ntfsfindbad tell me if what I'm thinking about would indeed be possible? Even if it only means simply listing the sectors used by each files and then forging the domain file by hand from that list after removing the expandable files that would still be a huge step.

    Thanks for your help,

    Guillaume Huet

     
  • maximus57

    maximus57 - 2017-12-13

    HI Guillaume,

    First, I would like to make sure it is clear that I am no longer actively supporting the ddrutility project.

    I can tell you that what you ask it is possible in a way, as I have done it. The public software is based on private software that I wrote to recover files from a friend's drive. But it is very difficult, and my private software required multiple manual steps to do so. And while it did work for what I needed, it was still flawed. It is a big task to figure out how to properly process a file system, especially if it is potentially damaged. Unfortunately I have no intentions of releasing my private software that can do this. It is not as simple as you hope, I spent months and months of time working on what I did accomplish, plus I had to learn C programming. But if you have time to spare, and are a very creative and intuitive person, it can be done.

    Hint: The entire folder structure can be rebuilt from just the MFT without reading the folder data from the folder inodes (not the proper termanology, but I don't remember what to call it offhand). Every inode has the parent inode listed in it's MFT record, and the entire folder structure can be built by this data.

    Regards,
    Scott

     
  • maximus57

    maximus57 - 2017-12-13

    Another hint: The bitmap itself is a file. Look at how it is found and read in the software.

    Possible helpful link for you to learn more about the filesystem: http://www.kes.talktalk.net/ntfs/index.html

     
  • Guillaume Huet

    Guillaume Huet - 2017-12-13

    Hello Scott,

    Don't worry, it was clear when I posted that you no longer maintained the project, that's partly why I was adressing not only to you but to anyone who was potentially aware of the way the software was working with the MFT.

    Thanks a lot for the very quick reply, I wasn't expecting any activity in days or weeks.

    I've had a look at your new project, HDDSuperClone has you already answered another subject by mentionning it but first it isn't able to create any specific domain from the MFT from what I read in the manual and trying the software and second it seems it can only work on full disks and not on partitions, so I couldn't resume it from my partition rescue with ddrescue. I may try another full disk rescue later on to check if it's faster but so far I'd rather not touch again the already rescued parts of the disk.

    I've understood that the $BitMap AND the $Mft were files and that ddru_ntfsbitmap basically partly does what I need, that's why I had a look at the source. I think I've also understood the part of ddru_ntfsfindbad that related to folder structure reconstruction from MFT and since I know thanks to your tool that my MFT is fully recovered and error free I'm pretty confident that it is indeed possible.
    What I'm lost with is the way the sectors used by each files are obtainable from the MFT, I was not 100% sure that was was ddru_ntfsfindbad was doing, from your answer I understand that it could indeed be obtained.
    I'm not a C guru but I can understand enough of it to at least follow a reading of the source and modify small parts of it. That's what I did until a lot of NTFS structure references were made, even in the comments.

    Thanks a lot for your link, I indeed think that I will have to learn a bit more about the structure before falling back into the code.

    If I ever get something working I'll try to release a patch here because I'm pretty sure it would be interesting for a lot of people and not just me.

    Regards,

    Guillaume Huet

     
  • Guillaume Huet

    Guillaume Huet - 2017-12-15

    OK, I've read the document you sent me, I pretty much understand why I didn't understand your code at first, the NTFS structure is full of tricks, with informations on where to find the information on where to find the information [...] nested in tricky parts with little endian traps ;).

    I learnt a log but felt completelly disarmed until I noticed that your code already exports the data I need, namely the offset and length of every chunk of data for every files in the debug log, I just had to use it to get the data out and crunch it afterwards.

    Thing is it also reavelled hard to crunch the data, I spent a long time trying to figure out how to do it with standard tools because it felt it would be faster than program it for the precious time I had in order to save my precious data. I endend up doing it mainly with Microsoft Excel and although it's a dirty way to do it, I'm pretty sure that the result is what I expect and reduced the amount of data needed to rescue from 1TB to about 20GB, still a few days to go but not 5 years anymore ;).

    I've documented how I did it in the same time, remember it's dirty but I guess if only one person find this answer and successfully apply its steps to recover their precious data it would be worth it.

    The method described the algorithm, if anyone wants to implement it programmatically don't hesitate to share the result, it will be less complicated for newcomers. For my part I will stop there as I got the expected result (actually I'll confirm to you after the rescue if it actually worked).

    I'll write the process in a new message because it's quite long.

    Thanks a lot for this tool and for the help and documentation on NTFS !

    Regards

    Guillaume HUET

     
  • Guillaume Huet

    Guillaume Huet - 2017-12-15

    Hello again,

    I think I've sucessfully selected only the files of interest in my domain as defined in the previous post and I would like to explain how I've done it :
    NB : I've mostly used Windows softwares because I'm more used to it, I'm guessing if you're trying to rescue a partition from a NTFS disk, so are you.
    The use of ddrescue and ddru_* softwares were on an Ubuntu machine, you might be able to do everything on the Linux machine if you know similar softwares to what I'll be using.
    NB2 : There will be a lot of NBs in the steps, I advise you to read everything before actually performing the step in order to be sure not to forget something
    NB3 : I'm not responsible for any data loss in the execution of the workflow, eventhough it shouldn't be dangerous for your disk except in the 2 firsts and on the last step because we're not touching the disk in the other steps, you might loose data from the previous step if the actions are not followed properly, requiring you to start again from the beggining and every time you read from the failing drive it gets closer to dying, so think accordingly ;)

    In the following explaination, the following are the important parts, change it based on your configuration :
    /dev/sdb1 is the NTFS partition to be rescued, on which I want to select some files to be rescued first
    image.img is the image file I'm using to rescue the drive to

    • Use ddru_ntfsbitmat to create a domain containing only the MFT file (domainMFT) and a domain containing the areas where data is (domain) :
      $ ddru_ntfsbitmap -m domainMFT /dev/sdb1 domain
    • Use ddrescue to recover only the MFT from the failing disk :
      $ sudo ddrescue -m domainMFT /dev/sdb1 image.img mapfileMFT
      If the previous command didn't get the whole data from the MFT use ddrescue options to force retry or direct access, the following instructions implies you successfully recovered the MFT without errors
    • Forge a fake mapfile to pretend the partition consist only of bad sectors :
      Open the domain file from the first step, it should look something like this :
      # Rescue log file created by ntfsbitmap
      # Command line:
      # current_pos current_status
      0x00000000 +
      # pos size status
      0x00000000 0xBFCF0000 +
      0xBFCF0000 0x0030E000 ?
      0xBFFFE000 0xB7BDC000 +
      ...
      0xA518C14000 0x00100000 +
      0xA518D14000 0x2B5DDEB000 ?
      0xD076AFF000 0x00001000 +

      The addition of the last position and last size give the size of the disk (be carefull, the domainMFT stops at the end of the MFT and doesn't include the size of the disk).
      Here for example, the addition 0xD076AFF000 + 0x00001000 gives 0xD076B00000, remember the values are hexadecimal, use a hex calculator if the values are not trivial in order to avoid mistakes (for example, on Windows use the included Calculator, select "View/Programmer" and tick "Hex" on the left, you can then add hex values)

      Use this output value to force a domain like this and save it as domainErr :
      # Rescue log manually forged to fake a volume with only bad clusters
      # Command line:
      # current_pos current_status
      0x00000000 *
      # pos size status
      0x00000000 0xD076B00000 *

      Of course replace 0xD076B00000 with the actual size you computed from the previous step.
      - Use ddru_ntfsfindbad to list all files in the MFT and list their cluster positions in the debug file
      $ ddru_ntfsfindbad -D image.img domainErr
      You will get 2 important files :
      ntfsfindbad.log : The whole list of files and folders on your disk with their full path on the disk (name=) and their corresponding inodes (inode=)
      ntfsfindbad_debug.log : A list of all the sectors (offset= and size=) used by the files in the corresponding inodes (inodes=)

    • Use ntfsfindbad.log to reduce the list only to inodes of interest
      Import the data from ntfsfindbad.log in a spreadsheet program, I'll use Microsoft Excel, by separating the data by the equal sign "=". In column E and up you will have you file names, be carefull if you've got equal signs in the name of files, they can be split, reassemble them by sorting the ligns by column F in order to have all names to work on in the first lines (the process should be faster than If you had separated the data by the space sign " " ;) ).
      Select column B, then replace the expression "errors" by nothing "", column B is now a number field representing the inode of the file name in column E. You can now safely remove columns A, C and D that won't be used.
      The inode reference is now on column A and the file name in column B, you can sort the Data by column B to help you visualize the files of interest and start removing ligns of the files you don't need to recover.
      NB : Be carefull, instead of only keeping files you now you need, just remove files you know you don't need, as many file that you don't know about are actually files that are important for your NTFS image to mount properly!
      NB2 : You could also keep the "errorsize=" field, which actually represent the size of the file, to sort on file size to quickly find large vidéo files for example.
      NB3 : Actual files usually have the folder "." as their last parent folder, the other files are actually deleted ones that are still present on the disk because they have not been overwritten yet, except if you need to recover them with a tool like Recuva afterwards, you can safely remove them from the list (after checking that they are not NTFS important files again), same goes for everything in "./$RECYCLE.BIN/" except for the folder itself and everything in "./$Extend/$Deleted/" except for the folder itself.

    • Use ntfsfindbad_debug.log to assocate the inodes with the clusters
      Import the data from ntfsfindbad_debug.log in Excel, by separating the data by the equal sign "=" AND the space sign " ". In column B you will have the inode, on column F the offset and on column J the length, you can remove the other columns, leaving them as A, B and C respectivelly.
      Sort everything by column A (inode).
      In cell D1, input the formula "="0x"&DEC2HEX(CEILING(HEX2DEC(RIGHT(C1;LEN(C1)-2))/4096;1)*4096)" (see NB4), this will round up the size to the next cluster, you can then fill column D with this formula, copy its content, and paste it as a value in column C, then delete column D.
      In cell D1 again, input the formula "=HEX2DEC(LEFT(RIGHT(B1;LEN(B1)-2);LEN(B1)-3))", this will help us sort on the offset number because Excel doesn't handle properly hex values, fill column D.
      In cell E1, input formula "=B1&" "&C1&" +"" and fill column E, this will help us format the output as a domain file later on.
      NB : If your MFT is fragmented you could have empty ligns and ligns that read something like "mft part# 1 count=31280", use sorting to remove these.
      NB2 : In the debug file there is a column "offset" and a column "fulloffset", I don't know the difference between the two, on my disk they were all the same, so I only kept "offset", maybe Maximus can explain the difference and point to which one to use if they differ, I'm guessing it's for when you use the input offset of the ddru_ntfsfindbad command.
      NB3 : You might get a few "WARNING! (hard error) Inode ## does not have the FILE signature" at the end of the file, I'm guessing that it just means that the last entries in the MFT are not yet initialized and that it's OK, I don't know what it would mean if they appear in the middle of the file, just check that the Inodes listed are not ones that you selected in the previous step.
      NB4 : If english is not the language that your version of Excel is using you can use https://fr.excel-translator.de/fonctions/ to get the equivalent function name, mine is in french and I've used this tool to translate them, I'm sorry if I made any mistake, I can't check the formula, feel free to tell me and correct it.
      NB5 : I'm removing the last 3 characters from the second formula before converting to decimal, because else Excel think that 10 caracter values starting with 8 or more represent a negative value and sorting is then a mess.
    • Select only the selected inodes from the first spreadsheet in the second one.
      In this part I will consider that the first spreadsheet is the one from "ntfsfindbad.log" and the second from "ntfsfindbad_debug.log".
      Select every cell in column A in the second spreadsheet (not the whole column, just every non empty cell, Excel doesn't seem happy with the operation on the whole column). Then go to "Data/Sort & Filter/Advanced" and click "OK" on the warning. The "List range" should be already set from your selection, now select the "Criteria range" by selecting the cells in column A of the second spreadsheet (again, selecting the whole column doesn't seem to work) and verify that "Unique records only" is unselected. Excel will be unresponsive for some time, this is normal, it's crunching the potentially huge amount of data and was not really optimized for it, after a few minutes you should get a sorted list of only interesting values.
      NB : This only seems to work with both lists sorted, be sure that they are.
      NB2 : I don't know if it's a bug in Excel but it seems that eventhough the "Unique records only" is unckecked, the potential duplicate records of the inode 0 are removed, since inode 0 is the MFT and we already saved it whole it doesn't matter, you can check but it should be OK for the other records.
    • Forge an incomplete mapfile from the crunched data
      Sort everything again from column D (decimal value of the offset), to get an ordonned list from the filtered data.
      Select column E of the sorted data and copy it to a blank text file, it should only copy lines that are filtered (be sure to check that it is).
      Add the following lines at the beggining of the file to create a valid mapfile :
      # Rescue log manually forged to select only data of interest on the disk
      # Command line:
      # current_pos current_status
      0x00000000 ?
      # pos size status
      Save this file as "domainTBC" ("To Be Completed") then send the file to the linux machine to continue.
      On the linux machine, use ddrescuelog to complete the mapfile in order to use it properly :
      $ ddrescuelog -C domainTBC > actualdomain
      You might get an error like "ddrescuelog: error in logfile domainTBC, line ###.", go to the corresponding line in the file, it is certainly a duplicate of another line or a zero length file, confusing ddrescuelog, you should be able to manually correct the error and retry until you get no error.
    • Finally resume the rescue from the final domain :
      $ sudo ddrescue -m actualdomain /dev/sdb1 image.img mapfile
    • After insisting on the important areas with the various ddrescue options you can now go back to rescuing the remaining of the data with the bitmap domain :
      $ sudo ddrescue -m domain /dev/sdb1 image.img mapfile

    In the hope that it will save someone else data, I'll keep you updated if it actually help me after the work of ddrescue is over.

    Regards,

    Guillaume HUET

     

    Last edit: Guillaume Huet 2017-12-15
  • maximus57

    maximus57 - 2017-12-15

    Wow, that is very creative! I like the way you think! I am obviously not going through it step by step to verify it, but in theory it should work. You even delt with Excel not liking hex numbers (been there, done that, Excel sucks for hex). I did see you wondered the difference of offset and fulloffset. I believe that offset is the offset withing the partition, and fulloffset is the offset from the beginning of the disk. The reason they were the same for you is that you are imaging a partition. If you were to use the whole disk as a source and supply the partition offset manuallly, then you would use fulloffset.

    FYI there is a reason that I always recommend using the full disk as the source, and supplying partition offsets manually. The disk could degrade during recovery and the OS may not be able to figure out partition information any longer. That means that you would no longer be able do select the partion as the source, which can complicate things as now you have to do some fancy work to shift the log to make it work with the whole disk as the source (I think I did that once in Excel).

     
  • maximus57

    maximus57 - 2017-12-15

    I would like to add there there is a case where this won't work for certain files. It is possible for some files cluster information to be too large to fit in the MFT record, and is instead located (or continued, I can't rember) in a separate inode that points to a file that contains the data. That is not supported in my software, as it was too complicated at the time and I did not need to use it for my purposes. I belive there was a reference to this someplace in this discussion area, possibly a case involving the MFT itself. This issue is most likely to happen with larger files and heavy fragmentation.

     
  • Guillaume Huet

    Guillaume Huet - 2018-03-06

    Hello, I'm back after more than 2 months of recovery!

    Thanks to your wonderful tool, I could recover about 95% of the useful files without spending a few years on trying to recover the useless ones!

    I can confirm that the technique explained in my previous post actually works, I will add a few informational things about what to do at the end of the rescue for future people following my post.

    First of all, I've discovered the concept of sparse files during the recovery, I will explain a bit what it is and why it can help with the recovery. A sparse file is a file that is shown on a partition as being a certain size but contains mostly zero filled areas. For this reason it is stored on the disk as just the non zero data and meta-data listing the zero filled areas. A disk image at the begging of a recovery is basically just a file filled with zeros and of the size of the rescued media. For this reason, if it is stored on a sparse enabled media (NTFS, ext2 and up) it will only use a small part of the available space on the recovery media.
    If the full disk is to be rescued, at the end the image file will actually be the size of the disk but in the case of the recovery of the MFT alone, the actual stored size of the recovered image is tiny. The same goes if you try to rescue only a part of the disk, for example by following the instructions above. You can read more about sparse files here : https://en.wikipedia.org/wiki/Sparse_file
    This means two important things:

    • First of all you don't need a recovery media bigger than the recovered one if you don't need to recover the whole data (you need only about 10% more than the size of all the data you need to recover) if and only if your recovery media is formated in a file format that supports sparse files.

    • Secondly, when you are copying the image file, don't forget to use sparse aware tools, else the dumb copy will not only be very slow because it will try to read every zero bits of the input sparse file, but it will also result in the copy being store as non-sparse file and actually taking a lot of space on the media. On Windows with NTFS, you should download the sparse.zip tool at the end of this page to do so : http://www.flexhex.com/docs/articles/sparse-files.phtml . On Linux, cp is supposed to be sparse aware but you can force it with "cp --sparse=always file1 file1_sparsed" just to be sure.

    With that said, here's what you need to do after the rescue with your custom mapfile is either finished or the remaining sectors are all dead :

    • Stop the rescue with Ctrl+C if it's still running, then properly shutdown the PC with the rescued and the recovery media. If like me its been working non stop for a few month, I advise to let the PC and the disk rest for a little before continuing the recovery.

    • For your own information, use the ddru_ntfsfindbad tool with the actual mapfile to list the files that are either not recovered of only partially recovered.

    • Make a copy of the recovered image file, you'll have to open the rescued image as read-write and you don't wan't to mess up with a file that took you a few months to create ;). You can make the copy on the recovery media if you have enough room (resulting is about 220% of the recovered data) but just in case the second media fails, I advise you to copy the image on a third disk if you have one available. Remember to copy on a sparse enabled formatted disk with a sparse aware tool.

    • Mount this third disk in Windows and download and install "OSFMount" ( https://www.osforensics.com/tools/mount-disk-images.html ), it allows you to mount images as read-write inside Windows, most usual tools like VCD or DaemonTools will only mount read-only.

    • Mount the image file with OSFMount, for example as the drive letter Z: and untick Read-only drive.

    • Open a command prompt and enter "chkdsk /r Z:" (if Z: is your mount point), the check of the disk will be long because a lot of the files listed in the MFT are actually not in the image.

    • After the command finishes, with a lot of normal errors, run it a second time, it should be a lot quicker and report no error this time.

    • After the second check-disk you should be able to explore the disk, to avoid errors once again, I advise you to unmount the image and remount it read-only before you continue.

    • This time browse the image to copy everything useful that you actually rescued. Remember that although all the files from the initial disk appear with their full reported size, most of them are actually only filled with zero and their data is lost. Use the output from ddru_ntfsfindbad to remember what files are lost or damage, I advise you to not copy the lost files and try to open the partially recovered files before copying them. This will avoid thinking that a file is safe because you see it listed when you browse the rescued media. The partially recovered files, depending on what kind of data they contain can either be un readable, partially readable or completely readable, it's your decision to evaluate what you want to keep. For example jpg images will have their colors desynchronized but picture recovery softwares might be able to retrieve the original picture with just a few dark spots at some places.

    Regards,

    Guillaume HUET

     
  • Guillaume Huet

    Guillaume Huet - 2018-03-06

    Scott,

    Thank you for your tool, first, which was very useful to me!

    Thanks for your explanation of the difference between offset and fulloffset, it is clearer now.
    Thanks for your clarification of why a full disk as source is better, I was too far into the recovery with the partition alone to mess anything trying to change to full disk but after your command I was a bit afraid of loosing the MBR during the recovery, I guess I was lucky this didn't happen. I think the whole process described over will still work with a full disk recovery, by interchanging the offset and fullofset, being careful when calling ddru_ntfsfindbad and selecting the correct partition before mounting with OFSMount.

    Thank you for your remark concerning very large or heavily fragmented files, this didn't apply to my case but it can be useful to know for anyone following the previous post to check!

    This adventure was rather unpleasant from a data loss point of view (as said in the previous post, only about 5% at the end), but very interesting from an understanding of disk file format point of view. To be honest, before last December even-though doing it quite often, I wasn't at all aware of what "disk formatting" meant, I though is was something like a software change in the disk controller to allow for different way of writing the data. I've now learned the full process happening in a drive when you need to access a file on a NTFS formatted file system, and I'm guessing that diving as deep into any other file format specification would result in as much wonders of ingenuity to get everything to work together.

    Regards,

    Guillaume HUET

     
  • maximus57

    maximus57 - 2018-03-08

    Hi Guillaume,

    Congratulations on your (95%) success! I even learned something that I did not know about using cp to copy sparse files. You did a very good job of explaining things.

    And thank you very much for your donation :)

    Regards,
    Scott

     

Log in to post a comment.