Menu

"NTFS Change Journal" support to identify file modifications faster and more reliable

Help
2014-01-21
2014-12-03
  • Jens Bornemann

    Jens Bornemann - 2014-01-21

    Hi,

    this approach might be also interesting for other OS and files systems that have journals.
    I was surprised that I couldn't find that "idea/enhancement" in the forum already (or did I miss it?).

    I know, the primary focus with snapraid is on "static" data, but why not use journal information to identify added, deleted and modified files (and directories in future?) instead for scanning the entire disk?

    Today I've identified 2 silent corruptions that were actually none: application/OS changed the file content and "restored" the original last modified date. That's really bad behavior, but out of my control. Right now, the only option I have is to exclude this data from snapraid...

    I couldn't find Microsoft NTFS Change Journal API, but it's public and should be there... somewhere ;-)
    If the journals don't contain the logged changes anymore, snapraid could still fall-back to the scanning approach.

    Let me know your thoughts.

    Cheers,
    Jens.

     
    • Jens Bornemann

      Jens Bornemann - 2014-01-21

      Ok, missed that one https://sourceforge.net/p/snapraid/discussion/1677233/thread/716be5d9/#b54f but was more a question, but Ludwig had the same expectation here: faster than file scan...
      I'm more interested in catching silent data changes, that aren't corruptions...

       
    • Jens Bornemann

      Jens Bornemann - 2014-01-23

      Found this demo accessing NTFS journal incl. src code: http://www.codeproject.com/Articles/11594/Eyes-on-NTFS

       
  • Jens Bornemann

    Jens Bornemann - 2014-09-28

    Also some C code on MSDN: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365736%28v=vs.85%29.aspx
    Will check how to integrate into snapraid next couple days, as in my SR config 95% of the entire sync time goes into data drives folder scanning ;-)

     
  • Andrea Mazzoleni

    Hi Jens,

    Yep. Reading the journal is for sure an interesting approach!

    Anyway, my next major TODO entry for SnapRAID will be to use multiple threads to scan disks for changes. This should already provide some kind of improvement. I prefer to do this as first, as it's something that will work everywhere.

    But feel free to experiment :)

    Ciao,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-09-29

    Hi Andrea,

    you mean a thread per disk scan? I'm sure that will improve the total scan time.
    I was a bit surprised about the scanning disks time I have with SR in my environment (Windows 2012 on HP Microserver N54L), and thought before coding a test tool, why not google for an existing one and found https://sourceforge.net/projects/dirtree/. I thought I could test multi-threaded scanning even on a single disk (though did not ready expect any improvement) with that tool, but it was single threaded as well...

    Anyway, I have some interesting finding: though the CPU utilization for scanning a sample disk of some 40k directories with 400k files was between SR and dirtree pretty much the same, dirtree completed the scan perceived 10 times faster...
    Are you doing much more than just scanning the disk and retrieving all folders and file details? I cannot really explain that huge difference in scanning performance between SR and dirtree.

    Regarding the NTFS journal I found these old but really nice articles from 2009 for my reading before going through the API:

    Keeping an Eye on Your NTFS Drives: the Windows 2000 Change Journal Explained
    http://www.microsoft.com/msj/0999/journal/journal.aspx

    Keeping an Eye on Your NTFS Drives, Part II: Building a Change Journal Application
    http://www.microsoft.com/msj/1099/journal2/journal2.aspx

    Looks more complex than I initially expected - haha ;-)

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    Yes. SnapRAID does something more than a normal directory listing. It has also to gather information about the physical location of the files on the disk.

    Anyway, likely some optimizations for Windows are possible.

    Could you please try this special 7.0 version at: http://snapraid.sourceforge.net/alpha/

    First try a "snapraid diff" and then a "snapraid --test-force-order-dir diff".
    The diff command has no risk involved as it's read only, so, it's safe to try.

    Is it faster than the normal SnapRAID ?

    Ciao,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-01

    Hi Andrea,

    cool, quick alpha binaries and I had to test it today :-)
    Here are the even better stats of my tests (avg of two executions per diff-test):

    1. SR 6.3 diff: 7:23 Minutes
    2. SR 7.0 diff: 3:51 Minutes
    3. SR 7.0 --test-force-order-dir diff: 0:36 Minutes !!!

    The standard diff performance of 7.0 is already hugely improved, but with --test-force-order-dir over 12 times faster compared to 6.3!
    Now, I did not test sync with 7.0 as 3 files on my snapshot disks are having unexpected 0 size that required --force-zero option for complete diff execution. As 6.3 is reporting no differences, I'm a cautious though...

    Anyway, the new option improves scanning like hell!
    Great job!!!
    Once I can use it in production, I'm monitoring general sync through-put changes (if any ;-)!

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    Thanks for the promptly report. Now I know I'm working in the right direction :)

    Do you have some more info about that three files with 0 size ? Are they normal files ? I suppose they are not really with 0 size. What is their real size ?
    This is some way unexpected...

    In case you are interested, the reason of these timings, is that 6.3 has to do two slow calls for each file. To read inode and physical address.
    Now 7.0 uses a different and fast way to read inode, and it almost halve the time. The --test-force-order-dir option also removes the need of physical address, and then it becomes really fast.

    The plan is to be more selective about physical address, as it's needed only when the file is seen for the first time. So, in theory the super-fast speed, is the goal for 7.0 :)

    Thanks,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-01

    Hi Andrea,

    regarding the 3 files I'm also surprised. One is actually 0 byte, one is 154 bytes and the last one is 52,445,871 bytes. They only have one thing in common: they had open write file handles during NTFS shadow creation followed by the SR sync. But with SR 7.0 it's the first time I encounter this
    "The file 'C:/yadda' has unexpected zero size! If this an expected state
    you can 'diff' anyway usinge 'snapraid --force-zero diff'
    Instead, it's possible that after a kernel crash this file was lost,
    and you can use 'snapraid --filter yadda fix' to recover it.
    "
    -warning...

    I'm using disk volume shadow copies as SR data disks for almost a year, really stable...

    Cheers,
    Jens.

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-03

    Hi Andrea,

    quick update on SR 7 alpha running on VSS snapshot disks:
    I'm using commands like mklink /d "F:\$RECYCLE.BIN\snapraid\" "\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy35\" to mount the shadow copy on a path and use "F:\$RECYCLE.BIN\snapraid\" as a data disk. Now I executed a SR 7 sync and got the same zero error. After adding the --force-zero I've received many read errors on one of the files:
    Unexpected size change at file 'F:/$RECYCLE.BIN/snapraid/Save.tv & Co/Sick-Beard/mr-orange_Sick-Beard.git/Logs/sickbeard.log' from 79942318 to 79943426.
    WARNING! You cannot modify files during a sync.

    That's really strange because the data on the mount point is a shadow copy, and that data cannot change.
    Is SR 7 resolving mount points somehow, even on the data disk root level?

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    The reason of this behavior is the Windows filesystem cache. The new way to read directories use it, and such cache may return information that are up to five minutes older.

    Anyway, I've now changed SnapRAID to ensure that any modified file uses the old method to get updated information. This should be a good compromise. The fast method for static files, and the slow one for dynamic files.

    The same applies for reading the physical offsets of files. It's now done only for files that really need it. So, now it should be always fast, without the need of using the --test-force-order-dir option.

    Please let me know how it behaves.

    I've also added a new undocumented option --test-force-scan-winfind, to force the use of the old directory listing on all files.

    Available at: http://snapraid.sourceforge.net/alpha/

    Ciao,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-04

    Hi Andrea,

    thx for the quick updated alpha. I've tested the performance with and without --test-force-scan-winfind options:

    1. snapraid diff: 41 Seconds but with 3 "WARNING! Detected uncached size change for file 'yadda' It's better if you run SnapRAID without other processes running." (sames files mentioned earlier)
    2. snapraid --test-force-scan-winfind diff: 250 Seconds with no warnings

    But I'm really surprised about "Windows filesystem cache" behavior for disk shadow mounts.
    I'll check if using different mounting ways might prevent it or other ways exist to free the cache...

    Anyway, in my scenario, getting the new directory scanning way to work in the end, would be fast enough and NTFS journal scanning might not be required (though I'm still interesting to get it tested ;-).

    Regarding NTFS journal: it knows only about filenames (no path names) and File Reference Number, my question for you: Are these the same you are using in SR?
    The reason I'm asking is that resolving FRN to path names is not that strait forward and requires maintaining a local "database" that SR is already doing...

    Cheers,
    Jens.

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-04

    Hi Andrea,

    I've now tested SR 7 alpha sync on my system and it looks stable though the "uncached warnings" are a little strange, as they don't disappear even after running sync...

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    Don't worry about these "uncached" warnings. They are present just to confirm that the problem was the Windows cache. And indeed it's.
    I will just remove them in the final release.

    About this cache problem. It's a general one. And not related to shadow copies. The only official "reference of it is in the MSDN FindFirstFile doc, that says:

    In rare cases or on a heavily loaded system, file attribute information
    on NTFS file systems may not be current at the time this function is called.
    

    Unfortunately in Windows there are a lot of ways to list directories, but none that really read all info and fast :( SnapRAID has to do some kind of magic to get to it.

    Anyway, I think that we got a good solution now. Really thanks for your tests!

    About your question. Yes. File Reference Numbers are the ones that SnapRAID uses as inode info. Getting the list of FRN of the modified and new files, would allow SnapRAID to avoid to scan the directories.

    Ciao,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-05

    Hi Andrea,

    cool, will check on the NTFS Journal. It will be a playback of all changes including file & folder renaming, modification, create and delete. And it should do a fallback to regular scanning when not available or incomplete since last time...

    Cheers,
    Jens.

     
  • Jens Bornemann

    Jens Bornemann - 2014-10-22

    Hi Andrea,

    I've reviewed the scan code of SR and have to say it looks really good.
    Also the Win API regarding NTFS Journal is nice an very easy to use & compile.

    Let's see a far I get :-)

    Cheers,
    Jens.

     

    Last edit: Jens Bornemann 2014-10-22
  • Jens Bornemann

    Jens Bornemann - 2014-11-22

    Hi Andrea,

    I'm testing some code changes of dir handling in SR on the content data and tommy's to support the NTFS journal changes in scan. First I've tested to treat any dir to be stored in content (not only the emptydir). Also added the dir inode (now remember my first feature request of folder modify date restore - haha ;-). But to fully support journal folder renames and moves within a disk, I'm really thinking the flat file/dir structure should be changed to tree, representing the same hierarchy as it exists on the file system. I'm unsure about that bigger change a little bit in terms of possible performance impacts, memory consumption and general stability. I don't expect significant performance or memory impacts though.

    I really want to contribute (at least NTFS) journal support to SR and your feedback on that would really help me finding the right direction.

    Once my forked SR git repo works as I expect it and I'm confident with code changes, will push it to the right place for your review/feedback.

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    My recommendation is to keep the changes small. Smallest is the changeset more likely it will be integrated.

    Big changes needs a lot of work to get stable. Better to do one small step at a time. For example, restoring the dir timestamp can be a separate feature that can be integrated a lot easier by itself, and it's surely a good starting point.

    About the NTFS journal, the first thing you can do is to check about any speed improvement compared to the new fast dir scanning of 7.0. Obviously, using the journal must be faster to justify its integration.

    About changing the internal data structure, hmmmm. It's the kind of big change I would avoid. But maybe, it just because I miss the potential advantage of it.

    Anyway, I'm really interested to see your work :)

    Thanks,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-11-29

    Hi Andrea,

    I try to keep the the changes small but organizing in aligned branches is sth. I need/have to do ;-)
    I think changes look good, will push to my fork soon.

    Quick question in state.c (line 4577/8) / method state_filter: Why are you using filter_dir and not filter_path for each dir? I was testing my changes and tried to fix only one directory with "-f /test/" and realized directories on all disks tried to get restored.

    Great job with new version and your support - as always!

    Cheers,
    Jens.

     
  • Andrea Mazzoleni

    Hi Jens,

    The difference between filter_dir() and filter_path() is to tell if the passed string is a dir o a file.

    We have different exclusion rules for them, so the filter has to know what is it.

    Not sure about restoring all dirs. In true that part is only for empty dirs, but likely you modified that.

    Ciao,
    Andrea

     
  • Jens Bornemann

    Jens Bornemann - 2014-12-03

    Hi Andrea,

    thx - will review it later again (not that important right now).
    Now, though I've made many tests and good progress, I realized now I'm fishing too much in the unknown: WIN API, Journal, SR database extensions, new methods here and there...
    So, today I've decided I need test cases for any NTFS changes that get traced back by iterating the journal, before completing the journal support in SR. But first I'll clean up my work and push it before starting the journal test and validation tool.

    Brief summary of my changes (out of my head ;-):

    - 'R'/"Dir" tags replace 'r'/"dir" with mtime(ns) and inode support
    - 'S'/"Symlink" and 'A'/"Hardlink" replace link tags with mtime(ns) and inode support (though I think I might not need that anymore, as only symlinks have their own mtime and inode -> that why I need a test tool testing all cases)
    - rename emptydir methods and comments to dir (all dirs)
    - added mtime restore for dir (succeeded also with symlinks, but hardlinks probably don't support... for good reasons?!?...)
    - and probably some more...
    

    I try to define consistent commits, as all in one might be also confusing for me.
    (Hey, my first git commit/work ever ;-)

    CHeers,
    Jens.

     

Log in to post a comment.

MongoDB Logo MongoDB