Menu

#1225 Use NTFS transactions to save files when possible on Windows Vista and later

Won't_Implement
open
nobody
scite (88)
3
2019-08-05
2018-07-17
-
No

Hi,

SciTE currently attempts to save files via overwriting the previous file. This has two problems:

  • It is not atomic, meaning that write failures can lose data.
  • It loses metadata such as extended attributes used by the Window Subsystem for Linux.

It would be great if SciTE attemped to use transactions (CreateFileTransacted(..., OPEN_ALWAYS, ...)) to save files instead. If transactions are impossible (e.g. FAT32) then the old method can be used. However, if other problems occur (insufficient privileges, insufficinet resources, conflicting transactions, etc.) the user should be prompted as to what to do (e.g. retry/replace/cancel).

This would allow for the entire file write to be made atomic, and it would also allow Windows users to edit Linux files without losing metadata, which is very handy.

Note that there is a trade-off in that the disk space requirement would increase, because the old file must also be kept on disk by the system until the write completes. However, I don't think this should be a problem—if the write fails due to lack of space, the user can always delete the existing file if he's really willing to lose the data.

Thank you!

Discussion

  • Neil Hodgson

    Neil Hodgson - 2018-07-17

    This appears to be a complex feature to implement with a need to fall back to the current implementation on failure. Microsoft now recommends not using transactional NTFS https://docs.microsoft.com/en-us/windows/desktop/FileIO/deprecation-of-txf

    If someone produces a good implementation that uses transactions as an option, and handles failures well, it can probably go in.

     
  • -

    - - 2018-07-17

    I actually implemented it in a couple hours; it's simpler than you'd think. All it requires is calling CreateTransaction, CreateFileTransacted, CommitTransaction to manage the transaction object. The only part that needs a bit of attention is figuring out what errors to fail on, and what errors to ignore or retry on, but right now I'm just showing an error message and asking the user to decide. The part I haven't implemented yet is making it optional, but I can add that too if you're interested in my implementation.
    Do you have any requirements or guidelines for contributing code? Both in terms of the code itself as well as anything related to the contributor name, license, etc.

     
    • Neil Hodgson

      Neil Hodgson - 2018-07-18

      For this, Windows types shouldn't be exposed on non-Windows platforms. Ideally, FILE* would continue to be used and the save should work in either background thread mode or synchronous mode.

      There are something like 7 different locations where files are opened for writing in SciTE (find in files ".Open(") , not including Lua scripts or subprocesses. Do all of these get changed to have transactions or just document files? There are more instances of fclose as sometimes paths diverge after opening.

      Search for "save.deletes.first" for an example of an option setting that is located in the file saving code.

      The license is included in License.txt so you should understand that your code will be made available under that license. A human name is better but weird names, even "-", will be accepted.

      There is some coding style information at https://www.scintilla.org/SciCoding.html but also try to make the code similar to the current code.

       
      • -

        - - 2018-07-18

        Okay cool, thanks. We can indeed keep using FILE*as long as the C runtime has a way to shove a file handle into it (e.g. fdopen + _open_osfhandle); this is the case on MSVC where my current implementation works, though I haven't checked on MinGW. However I will probably need to add an abstraction layer (class File?) over it as the transasction handle is also required, and managed separately at the moment.

        There are something like 7 different locations where files are opened for writing in SciTE (find in files ".Open(") , not including Lua scripts or subprocesses. Do all of these get changed to have transactions or just document files? There are more instances of fclose as sometimes paths diverge after opening.

        Well it's up to us, and currently I'm only doing this in one location I think, but I think it makes sense to do it for all locations so I'll probably do that. There's no harm really. (The divergence shouldn't be an issue.)

        Thanks for the pointers!

        Also, two things to note:

        1. There is a separate bug (feature?) that may get changed (for the better?), unless I explicitly make the code mimic the old behavior. Namely, currently, SciTE fails to save files that are marked with "hidden" or "system" attributes, because it seems fopen calls CreateFile with CREATE_NEW, which fails with hidden or system files (described here). I don't really see the point of this since it's just annoying to not be able to save hidden or system files, but if this is a problem, let me know so I can explicitly add back code to mimic the old behavior. (We can't let it keep using CREATE_ALWAYS since that will overwrite the same file metadata we're trying to preserve.)

        2. Files with Linux metadata (the LXATTRB extended-attribute) have separate timestamps from the Windows timestamps. Since I'm keeping the metadata as-is, those timestamps will not be updated unless I add extra code to update them. Do you care one way or the other what the behavior is here?

         
        • Neil Hodgson

          Neil Hodgson - 2018-07-19

          Adding a new type is what I hoped to avoid as it extends the scope of this over the other platforms instead of being a tidy tiny change. The name 'File' is insufficiently specific for something that appears currently to be a file in the process of being written.

          All features should work identically when built with MinGW or MSVC.

          Having different behaviour with respect to system and hidden files depending on the state of this option is a complication so should be avoided.

          To avoid the potential for unexpected behaviour and failures for current users, the new option should default to off. Microsoft's deprecation of transactional files means that they should not be the default path.

          Extra user prompts should avoided unless really needed. Abort/Retry/Fail dialogs, in particular, are widely loathed and difficult to justify as they take control away from the user.

          Updating extra attributes doesn't appear necessary initially. It could be looked at in the future.

           
          • -

            - - 2018-07-19

            Okay thanks, I'll see what I can do.

             
          • -

            - - 2018-07-21

            I wrote and sent you the code for transactions, and I also had some more thoughts:

            There is something else that could be done to keep the file metadata without needing transactions: simply don't ask for fopen to truncate the file if it exists. Instead, truncate the file manually yourself. Unfortunately, fopen doesn't actually support this in pure C, but POSIX open does (or at least CreateFile does, in any case), if used along with chsize/ftruncate.

            Currently, only the transactional code path does this, but this could actually be an independent option from enabling transactions -- the two are rather independent. Transactions would still be useful to maintain file integrity, but they wouldn't be necessary just for preserving metadata.

            Any thoughts on this? It should be pretty easy for me to do this with my new code given that I had to introduce a File class already.

             
          • -

            - - 2018-07-21

            Actually, I just updated the patch I sent you to accommodate this possibility.
            It also fixes a couple bugs I found in my previous version.

             
  • Neil Hodgson

    Neil Hodgson - 2018-07-21

    Just attach the patch to a message on this tracker. That allows anyone to examine and comment on the changes and to understand what was done in the future.

     
    • -

      - - 2018-07-22

      Okay sure, I've attached my latest patch. It applies to the following commit on master:

      7/14/2018 6:33:11 PM
      Feature [feature-requests:#1222]. German translation for desktop file.
      

      You can apply it with

      patch -p0 -i "0001-Transactions-support.patch"
      
       
  • Neil Hodgson

    Neil Hodgson - 2018-07-25

    That's a very large patch. My initial opinion is that the benefits aren't worth the added complexity so I posted to the mailing list to see what others think.

    It is likely that a smaller and less intrusive patch could be implemented that performed a transactional rename after using the SciTE 4.1.0 code to successfully save to a temporary name.

     
    • -

      - - 2018-07-25

      Neil, renaming would completely defeat the foremost issue I was trying to address here, which was preserving metadata so that people could use SciTE to edit WSL files.

      Your response makes me really sad in all honesty. I spent an entire day writing this in a way that ensures it's extremely non-intrusive, and you just judged it based on its size rather than based on the actual content.

      Could you please look through the actual changes? You should see that the changes in almost all the files are extremely mundane and superficial -- they're just changing fwrite to f.write, fprintf to f.printf, FILE * to File, etc...

      The only nontrivial portion is File::File()... that's the part that you really need to worry about, and I'm happy to guide you through it if it's confusing. (Or through the other parts of the code for that matter.)

       
      • Neil Hodgson

        Neil Hodgson - 2018-07-26

        I was dismayed by the amount of effort in the patch. When I asked "Do all of these get changed...?" it was to determine the scope and not a request. Over half of the patch is changing the export code in an uninteresting way. Its also changed file reading with no apparent benefit.

        My hope was that the feature could be added by extending the code that opens files for write with a smaller extension at file close time. Maybe 50 lines added to opening and 15 at close. Contributors could continue writing file output code using standard C / C++ libraries and not need to understand a new File class.

        The code does not compile with Visual C++ 2017 15.7.5. Most of the changes are trivial additions of headers (cassert and stdexcept in File.cxx and utility in Utf8_16.cxx) but there is some more complexity with narrow versus wide characters and _WSTDIO_DEFINED. SciTE's assumptions are that file paths are either wide or narrow depending on platform but that all reading or writing is by char except for the explicit code in Utf8_16.

        There are several places in File where wchar_t[] arguments are allowed conditionally but these are never called. It looks like another project's code was copied and pasted in without concern for which methods are actually needed in SciTE. This is also reflected in the KTM class where the 'A' form CreateFileTransactedA is not needed.

        Windows uses file system tunneling to preserve some file attributes over file saves. Its possible that this isn't currently applied with WSL but there may also be ways to indicate it should be done.

        Please don't work on implementing things mentioned in this reply. Its much more worthwhile defining the core requirements and understanding the scope of the issue.

         
        • -

          - - 2018-07-26

          You know, I wish you could give me a chance and ask me if there are any benefits to my approach. Like I said, I'm more than happy to explain everything I've done. Instead, you start out by criticizing the length of the diff, claim there are no benefits without giving me a chance to explain anything, and demand things that as far as I know are outright impossible (more on this below) or based on wishes that wouldn't work out the way you'd hope.

          When I asked "Do all of these get changed...?" it was to determine the scope and not a request.

          Well then why did you not turn down the idea when I replied by saying I could do this for other files as well? I thought you would want consistency in your project (in this case with respect to file I/O) so that's why I replied that yes, we could change all of them, there's no harm. I already tried to do everything you asked that I felt could be done, and if you'd said something about this right then, I would've avoided changing these. Instead you don't say anything and make it sound like this is a huge issue. I could easily revert these in 5 minutes.

          Over half of the patch is changing the export code in an uninteresting way. Its also changed file reading with no apparent benefit. [...] Contributors could continue writing file output code using standard C / C++ libraries and not need to understand a new File class.

          See above -- you could've just asked me not to do this when I said I was going to, but you didn't have an issue with it. Personally I think you've gained a lot, and lost nothing: you've gained a layer of indirection over FILE* that should make future additions to the I/O code much easier than it previously was, while keeping I/O interface consistent across the codebase. Somehow, you seem to think you've lost something (lost what?) and gained nothing? I can't say I understand it, but in any case, this is far more trivial to revert than to criticize or argue about. I could probably revert this in 5 minutes.

          My hope was that the feature could be added by extending the code that opens files for write with a smaller extension at file close time. Maybe 50 lines added to opening and 15 at close.

          You keep making these wishes as if there was a better solution I'm refusing to implement. As I've been trying to say, I don't think this could be done satisfactorily. Insisting on this would make the code even more brittle and difficult to maintain than it already is. It's literally impossible to shove extra handles and destructors into FILE*, so you'd have to keep the transaction HANDLE somewhere else separately, and remember to manually CommitTransaction and CloseHandle every time you fclose, leading to brittle, duplicated code. Next time you add another diverging path for file closure, you'll have to remember to commit the transaction again. Moreover, if in the future you decide you need keep even more extra data alongside a FILE*, you'll still have the same exactly problem, only growing more painful every time. I solved this for you once and for all by introducing a File class that not only solves this problem, but which you can extend from now on if needed.

          Moreover, your unjustified insistence on using C's FILE*directly rather than proper C++ RAII is not a "good" thing. It's awful practice in C++. I've been trying to not criticize your code, because I really appreciate your project and I understand the language and projects both evolve, but the practices you insist on have had negative consequences and made your code difficult to extend (as is evident here). Your decision to stick with explicit resource management in C has resulted, for example, in SciTE having bugs where it forgets to close files that is has opened, making it impossible for me to unmount volumes after I open and close files on them while the editor is open. It's not trivial to to track down such leaks (I don't even know where this particular one is), and even if you do,they'll keep being introduced as the code evolves if you insist on manually fcloseing everything you fopen. That's exactly why C++ invented RAII. Now, since I needed to attach extra data to a FILE* anyway, the logical way to do that was with a File wrapper, which, as a side effect, also gave us an opportunity to lay a foundation for make your code robust against file-management problems like this. But since I understood you preferred the C API, I avoided radical changes (like using C++ exceptions), and kept the code otherwise identical to the C version aside from the necessary superficial changes. Seeing f.write and f.close instead of fwrite and fclose is not going to be a significant hurdle to you or anyone else trying to contribute to your codebase, and the use of a File class with a proper constructor and destructor only makes your code more robust.

          There are several places in File where wchar_t[] arguments are allowed conditionally but these are never called. It looks like another project's code was copied and pasted in without concern for which methods are actually needed in SciTE. This is also reflected in the KTM class where the 'A' form CreateFileTransactedA is not needed.

          The code does not compile with Visual C++ 2017 15.7.5. Most of the changes are trivial additions of headers (cassert and stdexcept in File.cxx and utility in Utf8_16.cxx) but there is some more complexity with narrow versus wide characters and _WSTDIO_DEFINED. SciTE's assumptions are that file paths are either wide or narrow depending on platform but that all reading or writing is by char except for the explicit code in Utf8_16.

          With respect to VS2017: I'm sorry, I just don't have VS2017 installed. I did the best I could with VS2015.

          With respect to char vs. wchar_t and otherwise: _WSTDIO_DEFINED is how MS's own headers decide which API to use on Windows, but if you don't like them, these are extremely trivial, last-minute polishing issues that I'd be 100% happy to address in a heartbeat once we've dealt with the major obstacles. I don't understand the point of raising these minor complaints when we you have issues with the heart of the code itself.

          Windows uses file system tunneling to preserve some file attributes over file saves. Its possible that this isn't currently applied with WSL but there may also be ways to indicate it should be done.

          I'm indeed quite familiar with NTFS tunneling, and I agree it would be nice if WSL could make our life easier via tunneling, but I've never seen any indication that this was the case, or even planned to be. If I'm wrong, please let me know. But otherwise it's really energy-draining to read criticism asking for the literally impossible.

          Also: tunneling is a last-resort hack to help users avoid losing data. It has never been foolproof or robust: it will fail if there is too long a delay between when the file is replaced, or if too many files are replaced in too short a timespan. Put another way, it's meant and designed for address the fact that programs are careless about how they handle user files, not as a substitute for proper file handling. So even if there was a way to make tunneling address this problem (which I hope there will be, but which I'm not aware of there currently being), it still wouldn't be a proper fix.

          Please don't work on implementing things mentioned in this reply.

          Well, they'd probably take me < 10 minutes to fix, but okay...

          Its much more worthwhile defining the core requirements and understanding the scope of the issue.

          By all means, please do? I thought we had done this already. I laid out my own thoughts in excruciating detail before beginning the coding, and you replied with your own requirements. I followed pretty much every constraint you imposed on me with the 1 exception of keeping files to a single FILE*interface, which I couldn't do (see above). So now I'm very confused that you claim we haven't had the requirements and scope laid out yet, but if we haven't, then sure, we can keep doing so. Let me know when you need my input?

           

          Last edit: - 2018-07-26
          • Neil Hodgson

            Neil Hodgson - 2018-07-27

            You know, I wish you could give me a chance and ask me if there are any benefits to my approach.

            You could have documented any benefits.

            Well then why did you not turn down the idea when I replied by saying I could do this for other files as well?

            It is not my responsibility to stop you doing things.

            See above -- you could've just asked me not to do this when I said I was going to, but you didn't have an issue with it.

            I did not ask you to do this. You decided to. Please stop blaming me for your actions.

            You keep making these wishes as if there was a better solution I'm refusing to implement.

            If there is no better solution then I am content with the current state of SciTE. Sometimes, a new feature simply isn't worthwhile.

            Moreover, your unjustified insistence on using C's FILE*directly rather than proper C++ RAII is not a "good" thing.

            SciTE previously used more non-standard classes including its own string class. These made it more difficult for people to contribute as they had to learn more. Language standard types are more familiar and likely to cause fewer errors.

            A class for holding open files might be a reasonable idea but that would be a separate issue. Changing one aspect of current behaviour should not impact other code excessively.

            Your decision to stick with explicit resource management in C has resulted, for example, in SciTE having bugs where it forgets to close files that is has opened, making it impossible for me to unmount volumes after I open and close files on them while the editor is open.

            This is most likely from the Win32 file chooser holding a reference to its most recently viewed directory. If its something else then it should be reported.

            With respect to VS2017: I'm sorry, I just don't have VS2017 installed. I did the best I could with VS2015.

            SciTE no longer support VS2015 - it should fail when trying to include string_view.

            I'm indeed quite familiar with NTFS tunneling, and I agree it would be nice if WSL could make our life easier via tunneling, but I've never seen any indication that this was the case, or even planned to be. If I'm wrong, please let me know. But otherwise it's really energy-draining to read criticism asking for the literally impossible

            It is likely that, if Microsoft are serious about WSL, then they will address this in the future. They are certainly aware of the issue: "Many editors will also strip the EAs when saving an existing file, again making the file unusable in WSL"

            At this point, it appears that it is not possible to implement the desired functionality with a small isolated patch so I would like to reject the feature request. Its possible that this could be reviewed later particularly if Microsoft make this easier.

             
            • -

              - - 2018-07-27

              You could have documented any benefits.

              Really now? You say this while literally entirely missing and/or ignoring the giant paragraph where I just documented for you how tunneling is a brittle, unreliable, last-resort hack that was meant to save users rather than relied upon by programs? And you literally started with complaining about the long patch length before considering the content, and you're telling me I should've added more content?

              It is not my responsibility to stop you doing things. I did not ask you to do this. You decided to. Please stop blaming me for your actions.

              It is entirely 100% your responsibility accept or reject proposed changes, and it is what you were already doing for everything else. The whole "defining the requirements and scope" thing you were eager to teach me about. And like I said: it was still trivial for me to revert those changes, nothing that couldn't be done in 5-10 minutes.

              SciTE no longer support VS2015 - it should fail when trying to include string_view.

              OK, so... what? I'm lying? No, I'm already well aware string_view is missing. It's impossible not to notice it when compiling your project. That's why I wrote my own compatible version for 2015 just so I could contribute to your project without making you take on the burden of supporting VS2015.

              It is likely that, if Microsoft are serious about WSL, then they will address this in the future. They are certainly aware of the issue: "Many editors will also strip the EAs when saving an existing file, again making the file unusable in WSL"

              I already explained how this was a brittle and unreliable hack to save user data that was never meant to remove programmers's responsibility to write editors that handled file metadata correctly. It's not even necessarily enabled; people may disable tunneling systemwide due to interactions with other programs, while still expecting their editors to handle files correctly. Also, FYI, preserving EAs would not preserve alternate streams, which WSL also claims to use in some situations, and which it is unlikely to ever tunnel given how it never did so before WSL.

              At this point I don't care to contribute back to your project though. The changes work for me, and you can worry about the rest of your users.

               

              Last edit: - 2018-07-27
  • Neil Hodgson

    Neil Hodgson - 2018-07-28
    • Group: Completed --> Won't_Implement
     

Log in to post a comment.

MongoDB Logo MongoDB