Menu

#239 MD5Sum a buffer and file for "file not saved" warning check

closed-fixed
nobody
None
5
2008-09-03
2008-08-27
Alan Ezust
No

When closing a dirty buffer, or
doing a "reload" of a dirty buffer,

Instead of popping up a dialog asking confirming that changes are not saved, do a comparison of the Md5sum of the buffer and what is on the disk, and if they match, don't show any dialog.

Discussion

  • Kevin Hunter Kesling

    Logged In: YES
    user_id=1271235
    Originator: NO

    Och, had not seen this ticket yet when responding to 2068307. Sorry 'about that. Since the other one is closed, I'll repaste my thoughts here:

    > Doing an actual check to see if what's on disk is what is in the
    > buffer could impact performance.

    Eh, without waxing philosophic, I think this is a red herring. The time spent waiting for the user will be no more than sitting and waiting for the user to decide what to with the dialog box. Here is a quick test, as an example:

    $ dd if=/dev/urandom of=./test bs=4M count=1

    $ md5sum test # will return "immediately"
    4f88c9488573eb188fe72837723fbba6 test

    $ time md5sum test # to be specific about it
    4f88c9488573eb188fe72837723fbba6 test

    real 0m0.055s
    user 0m0.040s
    sys 0m0.000s

    I *rarely* edit 4M files. Do others? For kicks, repeating with a 50M random file took 0.188s, almost instant. (Remember, you only need to do the md5sum when you open the file. Save that result and at save time, the check will be against what is in RAM, so no waiting to read from the disk.)

    Their could be a problem with opening and closing a large number of files, as with the sessions plugin. A test just now with md5sum over 50 different 4M random files took 0.791 seconds, less than 4/5s.

    $ for ((i=0; i<50; ++i)); do
    > dd if=/dev/urandom of=./test$i bs=4M count=1
    > done

    $ time for ((i=0; i<50; ++i)); do
    > md5sum test$i
    > done

    I would argue that the performance concern (in this case) is a non-issue.

     
  • Alan Ezust

    Alan Ezust - 2008-08-27

    Logged In: YES
    user_id=935841
    Originator: YES

    Java has built-in MD5 support, so I can try it out and see how it works.
    I thought of another improvement to jEdit that this would bring:
    during auto-save, sometimes it auto-saves the file that is the same as what is on disk.
    If the auto-save check also did an MD5sum check and would skip the buffers that are not different from what is on disk,
    and mark the buffer as "non-dirty" at the same time, that would avoid the situation where you had autosave files that are basically the same as what you already saved.

    interesting article about md5sum:
    http://www.twmacinta.com/myjava/fast_md5.php#built_in

     
  • Alan Ezust

    Alan Ezust - 2008-09-03
    • status: open --> closed-fixed
     
  • Alan Ezust

    Alan Ezust - 2008-09-03

    Logged In: YES
    user_id=935841
    Originator: YES

    committed rev# 13508

     
  • Alan Ezust

    Alan Ezust - 2008-09-03

    Logged In: YES
    user_id=935841
    Originator: YES

    I am testing it now, and it does seem perhaps like there may be a performance impact.
    But it is an option, switched off by default, so people won't notice it unless they use it.

    It is checking every time setDirty(true) is called on a non-new buffer. Which may be necessary to get the proper
    notifications sent to interested plugins that want to know which buffers are dirty at any given time.

    Perhaps it doesn't need to check that often, or perhaps we should use the fast MD5 implementation linked to in the previous comment.
    What do you think?

     
  • Anthony Keeley

    Anthony Keeley - 2008-09-03

    Logged In: YES
    user_id=1883940
    Originator: NO

    I skimmed over the fast md5sum implementation and it sounded interesting. I would advise that before taking a step in that direction, that more scientific tests be conducted, so that you wouldn't spend all that time on the new implementation, just to find out that it wasn't really worth the effort.

    Also, I liked your suggestion about md5sum and autosave. Is it feasible? What does everyone else think?

     
  • Kevin Hunter Kesling

    Logged In: YES
    user_id=1271235
    Originator: NO

    > I am testing it now, and it does seem perhaps like there may
    > be a performance impact.

    Hmm. I'm surprised by that. This end-user's use case is that it should only be an issue when closing a buffer or quitting jEdit. On the other hand, I now understand that it's a state that needs to be maintained continuously, as for plugins, and keeping the MVC model in sync. So ...

    A couple of optimization ideas:

    I assume you have meta information about each open buffer, such as an md5 hash, file size, date of last modification, encoding, text mode, etc. If the filesize and buffersize don't match, there's no need to run md5hash. They're clearly different.

    Perhaps you could add a delay of some sort? Only checking at predefined times, such as after at least x (10) seconds of inactivity, when switching buffers, text areas, or views, when quitting.

    If a buffer is not active, or hasn't been active, then there's no need to check it; it is the same state it was when last checked. My test of 50 files was more for opening or closing jEdit than thinking someone would edit 50 files at the exact same time.

    > It is checking every time setDirty(true) is called on a non-new
    > buffer. Which may be necessary to get the proper notifications
    > sent to interested plugins that want to know which buffers are
    > dirty at any given time.

    Hmm, is communication internally not asynchronous? I had supposed that the model stored some of this meta information about each buffer, and then a thin shim layer would respond as necessary to any (plugin) requests. Again, I'm not a jEdit code guru, but from a software engineering perspective this would seem to be in order. Thoughts?

    > Perhaps it doesn't need to check that often, or perhaps we
    > should use the fast MD5 implementation linked to in the previous
    > comment. What do you think?

    I am actually skeptical of using the faster md5sum implementation at this stage of development. I suggest that we attempt to optimize the use cases first, such as above. After we suss out when we do and don't need to use md5summation, then we can swap the better algorithm in place.

     
  • Alan Ezust

    Alan Ezust - 2008-09-03

    Logged In: YES
    user_id=935841
    Originator: YES

    duh. of course, only do MD5 when buffersizes are different!! excellent idea.
    Committed revision 13511.
    I doubt anyone will notice a performance impact now!

     
  • Alan Ezust

    Alan Ezust - 2008-09-03

    Logged In: YES
    user_id=935841
    Originator: YES

    i mean when buffersizes are *not* different, of course. :-)

     
  • Kevin Hunter Kesling

    Logged In: YES
    user_id=1271235
    Originator: NO

    One of those rare case where you mean "Do as I do! Not as I say!"

    Count it. :-)

     
  • Kevin Hunter Kesling

    Logged In: YES
    user_id=1271235
    Originator: NO

    Hmm, while I check this out, I note the wording in Utilities->Global Options->Saving & Backup is

    "Use MD5 hash calculations to check if isDirty()"

    Suggest clarifying that for the end-user. Something along the lines of:

    "Don't confirm close if open buffer matches disk contents"
    with a tooltip of "If you've edited a buffer, but the contents are still the same as on disk, don't ask to save."

    Or something like that.

     
  • Alan Ezust

    Alan Ezust - 2008-09-03

    Logged In: YES
    user_id=935841
    Originator: YES

    it doesn't check what is on disk - it checks the MD5 of the buffer at the time it was loaded.
    Here is the new wording:

    options.save-back.useMD5forDirtyCalculation=Don't show "file not saved" if buffer is the same as original
    options.save-back.useMD5forDirtyCalculation.tooltip=Use MD5 hash to calculate if changed buffers are really the same as they were at loading time.

     

Log in to post a comment.