#430 Compact Remove&Insert Edits into a Replace Edit

closed-accepted
None
5
2012-07-18
2012-06-02
Thomas Meyer
No

This helps to save some memory for the search&replace-all case as only one instead of two objects are stored in memory.

Discussion

  • The folding shouldn't be done outside compound edits because such edits
    should be undo/redoable individually.

    Also, if a candidate Edit is undoClearDirty or redoClearDirty, it
    shouldn't be folded since the identity is significant.

    The folding should be delayed until the candidates become not last one
    since the last Edit maybe updated via getMergeEdit().

    Would you please revise the patch?

     
    • assigned_to: nobody --> k_satoda
     
  • Thomas Meyer
    Thomas Meyer
    2012-06-06

    New patch on top of 3528619

     
  • I have looked through the patch. Please revise the patch for the
    following comments. Please assume "It looks like ..." for each comment.

    "The folding should be delayed until ..." in my previous comment still
    apply.

    At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
    check more; lastElement == redoClearDirty, newElement == ...

    Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
    undo() and redo() is unnecessary. Putting assert() may be reasonable.

    The return value of Replace.undo() and redo() looks wrong. Please verify
    that they return the same value with the case not folded.

    The added field mgr in CompoundEdit is redundant. You can get
    UndoManager after casting the folding Edits.

    The semantics of the return value of checkIfReplacement() is vague.
    How about returning null if it can't fold? In that case the name can
    be getReplaceFromRemoveInsert() and the type can be just Replace.
    This change will also remove rcEdit which looks unnecessary.

    Inconsistent spacing: edit=replacement;

    Strange empty line at the beginning of dropLastElement() body.

     
  • Thomas Meyer
    Thomas Meyer
    2012-06-07

    Hi,

    1.) "The folding should be delayed until ..." in my previous comment still
    apply."

    As long as only the sequence Remove+Insert is compacted into a replace, this is not necessary.

    2.) "At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
    check more; lastElement == redoClearDirty, newElement == ..."

    These other conditions cannot happen.

    3.) "Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
    undo() and redo() is unnecessary. Putting assert() may be reasonable."

    Replace can become a redoClearFirst candidate:
    1. Create a new File
    2. Enter "1234567890123"
    3. Save
    4. Replace "123" -> "abc"
    5. Save -> Replace becomes redosClearDirty
    6. Undo
    7. Redo -> Assertion hit.

    4.) "The return value of Replace.undo() and redo() looks wrong. Please verify
    that they return the same value with the case not folded."

    Fixed.

    5.) "The added field mgr in CompoundEdit is redundant. You can get
    UndoManager after casting the folding Edits."

    Fixed

    6.) "The semantics of the return value of checkIfReplacement() is vague.
    How about returning null if it can't fold? In that case the name can
    be getReplaceFromRemoveInsert() and the type can be just Replace.
    This change will also remove rcEdit which looks unnecessary."

    Fixed.

    7.) "Inconsistent spacing: edit=replacement;"

    Fixed.

    8.) "Strange empty line at the beginning of dropLastElement() body."

    Fixed.

     
  • > 1.) "The folding should be delayed until ..."
    > As long as only the sequence Remove+Insert is compacted into a replace,
    > this is not necessary.

    With the current implementation, I'm afraid of every contentInserted()
    after a folding will create a new instance of Insert and increase memory
    usage.

    Now I found the problem can be solved by an easier way; considering
    Replace as a merge target in contentInserted() as same as Insert.

    > 2.) ... we should check more; lastElement == redoClearDirty,
    > These other conditions cannot happen.

    Ah, I understood. Then some comments and/or assertions will be enough to
    connect the code and the comment just above it.

    > 3.) "Replace can't be undoClearDirty nor redoClearDirty.
    > Replace can become a redoClearFirst candidate:

    Thank you for the steps to hit the assertion. I understood.

    Now I wonder why every Edit has to do the check in it. If the check can
    be moved into UndoManager.undo/redo(), it seems more straightforward. If
    you or someone find problems to do that, I'll move them as another
    separate commit so that this patch won't be bothered by this issue.

     
  • >> As long as only the sequence Remove+Insert is compacted into a replace,
    >> this is not necessary.
    >
    > With the current implementation, I'm afraid of every contentInserted()
    > after a folding will create a new instance of Insert and increase memory
    > usage.

    Sorry I just realized that I was wrong. The waste will be merely one
    Insert not merged. Now I agree it is not necessary in favor of code
    simplicity.

     
  • Please consider to describe the purpose (optimize the memory usage for
    large number of replace) of the folding as appropriate comments.

    Same type checks and casts are repeated at caller and inside of
    getReplaceFromRemoveInsert(). They should not be repeated for both code
    quality and performance.

     
  • Thomas Meyer
    Thomas Meyer
    2012-06-10

    Updated patch attached. Please have a look at it and tell me what do you think of it?

    with kind regards
    thomas

     
  • Thomas Meyer
    Thomas Meyer
    2012-06-28

    Another approach to compress n Replace into a CompressedReplace. It works great for my test case... But I don't know how this will scale in general case. what do you think. Measurements, log message and change entry to follow.

     
  • Your new approach (v5) assumes that the most of offsets of occurrences
    are in a equal distance. But I don't see any reason to assume that
    happens to be.

    In contrast, many replace with same contents on arbitrary offsets likely
    happens with replace-all. So compressing them into an array of offsets
    sounds reasonable.

    I think we should finish v4 as a certain step, and separate further
    compression as another patch.

    Could you please update v4 for the latest trunk, and give a measure,
    change entry, and log message?

     
  • Thomas Meyer
    Thomas Meyer
    2012-07-02

    New patch attached.

    CHANGES.txt:
    Fix high memory usage for large search&replace all operations

    Log message:
    In a CompoundEdit compact a sequence of Remove&Insert Edits into a
    Replace Edit; further compact consecutive Replace Edits into a
    CompressedReplace Edit object.

    Meassurment:

    The IntegerArray uses 64MB of storage in the one and only
    CompressedReplace Edit object for 13.560.000 replacements. (4 Bytes *
    13560000 no. of replacements = 54.240.000 Bytes) = 64MB (because of
    quadratic allocation behaviour of IntegerArray).
    Before this patch the memory usage of all Remove&Insert Edit object
    for 13.560.000 replacements was:
    - Remove objects = 24 Bytes * 13560000 = 325.440.000 Bytes
    - Insert objects = 24 Bytes * 13560000 = 325.440.000 Bytes
    so in total 650.880.000 Bytes = 620 MB.

     
  • Thomas Meyer
    Thomas Meyer
    2012-07-02

    meassurment results

     
    Attachments
  • I applied v6 on r21923 and tested. After a large search&replace, undo
    just undo only the first occurrence. It seems not working. I don't see
    the cause so far. Does it work for you?

    Besides that, I found some whitespace issues in the patch:
    - Some added lines starts with a space followed by tabs. The spaces
    should be removed.
    - There are some unnecessary blank lines after opening brackets. I found
    at getCompressedReplaceFromReplaceReplace() method, and class
    CompressedReplace.
    Please take care next time.

     
  • Thomas Meyer
    Thomas Meyer
    2012-07-05

    hi, sorry, some untested late changes in getCompressedReplaceFromReplaceReplace broke the undo. corrected patch attached.

     
    • status: open --> closed-accepted
     
  • Accepted v7 as r21939. Thank you.