I have looked through the patch. Please revise the patch for the
following comments. Please assume "It looks like ..." for each comment.
"The folding should be delayed until ..." in my previous comment still
apply.
At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
check more; lastElement == redoClearDirty, newElement == ...
Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
undo() and redo() is unnecessary. Putting assert() may be reasonable.
The return value of Replace.undo() and redo() looks wrong. Please verify
that they return the same value with the case not folded.
The added field mgr in CompoundEdit is redundant. You can get
UndoManager after casting the folding Edits.
The semantics of the return value of checkIfReplacement() is vague.
How about returning null if it can't fold? In that case the name can
be getReplaceFromRemoveInsert() and the type can be just Replace.
This change will also remove rcEdit which looks unnecessary.
Inconsistent spacing: edit=replacement;
Strange empty line at the beginning of dropLastElement() body.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1.) "The folding should be delayed until ..." in my previous comment still
apply."
As long as only the sequence Remove+Insert is compacted into a replace, this is not necessary.
2.) "At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
check more; lastElement == redoClearDirty, newElement == ..."
These other conditions cannot happen.
3.) "Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
undo() and redo() is unnecessary. Putting assert() may be reasonable."
Replace can become a redoClearFirst candidate:
1. Create a new File
2. Enter "1234567890123"
3. Save
4. Replace "123" -> "abc"
5. Save -> Replace becomes redosClearDirty
6. Undo
7. Redo -> Assertion hit.
4.) "The return value of Replace.undo() and redo() looks wrong. Please verify
that they return the same value with the case not folded."
Fixed.
5.) "The added field mgr in CompoundEdit is redundant. You can get
UndoManager after casting the folding Edits."
Fixed
6.) "The semantics of the return value of checkIfReplacement() is vague.
How about returning null if it can't fold? In that case the name can
be getReplaceFromRemoveInsert() and the type can be just Replace.
This change will also remove rcEdit which looks unnecessary."
Fixed.
7.) "Inconsistent spacing: edit=replacement;"
Fixed.
8.) "Strange empty line at the beginning of dropLastElement() body."
Fixed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> 1.) "The folding should be delayed until ..."
> As long as only the sequence Remove+Insert is compacted into a replace,
> this is not necessary.
With the current implementation, I'm afraid of every contentInserted()
after a folding will create a new instance of Insert and increase memory
usage.
Now I found the problem can be solved by an easier way; considering
Replace as a merge target in contentInserted() as same as Insert.
> 2.) ... we should check more; lastElement == redoClearDirty,
> These other conditions cannot happen.
Ah, I understood. Then some comments and/or assertions will be enough to
connect the code and the comment just above it.
> 3.) "Replace can't be undoClearDirty nor redoClearDirty.
> Replace can become a redoClearFirst candidate:
Thank you for the steps to hit the assertion. I understood.
Now I wonder why every Edit has to do the check in it. If the check can
be moved into UndoManager.undo/redo(), it seems more straightforward. If
you or someone find problems to do that, I'll move them as another
separate commit so that this patch won't be bothered by this issue.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
>> As long as only the sequence Remove+Insert is compacted into a replace,
>> this is not necessary.
>
> With the current implementation, I'm afraid of every contentInserted()
> after a folding will create a new instance of Insert and increase memory
> usage.
Sorry I just realized that I was wrong. The waste will be merely one
Insert not merged. Now I agree it is not necessary in favor of code
simplicity.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please consider to describe the purpose (optimize the memory usage for
large number of replace) of the folding as appropriate comments.
Same type checks and casts are repeated at caller and inside of
getReplaceFromRemoveInsert(). They should not be repeated for both code
quality and performance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another approach to compress n Replace into a CompressedReplace. It works great for my test case... But I don't know how this will scale in general case. what do you think. Measurements, log message and change entry to follow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your new approach (v5) assumes that the most of offsets of occurrences
are in a equal distance. But I don't see any reason to assume that
happens to be.
In contrast, many replace with same contents on arbitrary offsets likely
happens with replace-all. So compressing them into an array of offsets
sounds reasonable.
I think we should finish v4 as a certain step, and separate further
compression as another patch.
Could you please update v4 for the latest trunk, and give a measure,
change entry, and log message?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
CHANGES.txt:
Fix high memory usage for large search&replace all operations
Log message:
In a CompoundEdit compact a sequence of Remove&Insert Edits into a
Replace Edit; further compact consecutive Replace Edits into a
CompressedReplace Edit object.
Meassurment:
The IntegerArray uses 64MB of storage in the one and only
CompressedReplace Edit object for 13.560.000 replacements. (4 Bytes *
13560000 no. of replacements = 54.240.000 Bytes) = 64MB (because of
quadratic allocation behaviour of IntegerArray).
Before this patch the memory usage of all Remove&Insert Edit object
for 13.560.000 replacements was:
- Remove objects = 24 Bytes * 13560000 = 325.440.000 Bytes
- Insert objects = 24 Bytes * 13560000 = 325.440.000 Bytes
so in total 650.880.000 Bytes = 620 MB.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I applied v6 on r21923 and tested. After a large search&replace, undo
just undo only the first occurrence. It seems not working. I don't see
the cause so far. Does it work for you?
Besides that, I found some whitespace issues in the patch:
- Some added lines starts with a space followed by tabs. The spaces
should be removed.
- There are some unnecessary blank lines after opening brackets. I found
at getCompressedReplaceFromReplaceReplace() method, and class
CompressedReplace.
Please take care next time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
compaction patch
The folding shouldn't be done outside compound edits because such edits
should be undo/redoable individually.
Also, if a candidate Edit is undoClearDirty or redoClearDirty, it
shouldn't be folded since the identity is significant.
The folding should be delayed until the candidates become not last one
since the last Edit maybe updated via getMergeEdit().
Would you please revise the patch?
Patch on top of 3528619
New patch on top of 3528619
I have looked through the patch. Please revise the patch for the
following comments. Please assume "It looks like ..." for each comment.
"The folding should be delayed until ..." in my previous comment still
apply.
At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
check more; lastElement == redoClearDirty, newElement == ...
Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
undo() and redo() is unnecessary. Putting assert() may be reasonable.
The return value of Replace.undo() and redo() looks wrong. Please verify
that they return the same value with the case not folded.
The added field mgr in CompoundEdit is redundant. You can get
UndoManager after casting the folding Edits.
The semantics of the return value of checkIfReplacement() is vague.
How about returning null if it can't fold? In that case the name can
be getReplaceFromRemoveInsert() and the type can be just Replace.
This change will also remove rcEdit which looks unnecessary.
Inconsistent spacing: edit=replacement;
Strange empty line at the beginning of dropLastElement() body.
Hi,
1.) "The folding should be delayed until ..." in my previous comment still
apply."
As long as only the sequence Remove+Insert is compacted into a replace, this is not necessary.
2.) "At if(lastElement == undoClearDirty) in checkIfReplacement(), we should
check more; lastElement == redoClearDirty, newElement == ..."
These other conditions cannot happen.
3.) "Replace can't be undoClearDirty nor redoClearDirty. Thus the check in
undo() and redo() is unnecessary. Putting assert() may be reasonable."
Replace can become a redoClearFirst candidate:
1. Create a new File
2. Enter "1234567890123"
3. Save
4. Replace "123" -> "abc"
5. Save -> Replace becomes redosClearDirty
6. Undo
7. Redo -> Assertion hit.
4.) "The return value of Replace.undo() and redo() looks wrong. Please verify
that they return the same value with the case not folded."
Fixed.
5.) "The added field mgr in CompoundEdit is redundant. You can get
UndoManager after casting the folding Edits."
Fixed
6.) "The semantics of the return value of checkIfReplacement() is vague.
How about returning null if it can't fold? In that case the name can
be getReplaceFromRemoveInsert() and the type can be just Replace.
This change will also remove rcEdit which looks unnecessary."
Fixed.
7.) "Inconsistent spacing: edit=replacement;"
Fixed.
8.) "Strange empty line at the beginning of dropLastElement() body."
Fixed.
v3
> 1.) "The folding should be delayed until ..."
> As long as only the sequence Remove+Insert is compacted into a replace,
> this is not necessary.
With the current implementation, I'm afraid of every contentInserted()
after a folding will create a new instance of Insert and increase memory
usage.
Now I found the problem can be solved by an easier way; considering
Replace as a merge target in contentInserted() as same as Insert.
> 2.) ... we should check more; lastElement == redoClearDirty,
> These other conditions cannot happen.
Ah, I understood. Then some comments and/or assertions will be enough to
connect the code and the comment just above it.
> 3.) "Replace can't be undoClearDirty nor redoClearDirty.
> Replace can become a redoClearFirst candidate:
Thank you for the steps to hit the assertion. I understood.
Now I wonder why every Edit has to do the check in it. If the check can
be moved into UndoManager.undo/redo(), it seems more straightforward. If
you or someone find problems to do that, I'll move them as another
separate commit so that this patch won't be bothered by this issue.
>> As long as only the sequence Remove+Insert is compacted into a replace,
>> this is not necessary.
>
> With the current implementation, I'm afraid of every contentInserted()
> after a folding will create a new instance of Insert and increase memory
> usage.
Sorry I just realized that I was wrong. The waste will be merely one
Insert not merged. Now I agree it is not necessary in favor of code
simplicity.
Please consider to describe the purpose (optimize the memory usage for
large number of replace) of the folding as appropriate comments.
Same type checks and casts are repeated at caller and inside of
getReplaceFromRemoveInsert(). They should not be repeated for both code
quality and performance.
Updated patch attached. Please have a look at it and tell me what do you think of it?
with kind regards
thomas
v4
Another approach to compress n Replace into a CompressedReplace. It works great for my test case... But I don't know how this will scale in general case. what do you think. Measurements, log message and change entry to follow.
v5 with compression
Your new approach (v5) assumes that the most of offsets of occurrences
are in a equal distance. But I don't see any reason to assume that
happens to be.
In contrast, many replace with same contents on arbitrary offsets likely
happens with replace-all. So compressing them into an array of offsets
sounds reasonable.
I think we should finish v4 as a certain step, and separate further
compression as another patch.
Could you please update v4 for the latest trunk, and give a measure,
change entry, and log message?
New patch attached.
CHANGES.txt:
Fix high memory usage for large search&replace all operations
Log message:
In a CompoundEdit compact a sequence of Remove&Insert Edits into a
Replace Edit; further compact consecutive Replace Edits into a
CompressedReplace Edit object.
Meassurment:
The IntegerArray uses 64MB of storage in the one and only
CompressedReplace Edit object for 13.560.000 replacements. (4 Bytes *
13560000 no. of replacements = 54.240.000 Bytes) = 64MB (because of
quadratic allocation behaviour of IntegerArray).
Before this patch the memory usage of all Remove&Insert Edit object
for 13.560.000 replacements was:
- Remove objects = 24 Bytes * 13560000 = 325.440.000 Bytes
- Insert objects = 24 Bytes * 13560000 = 325.440.000 Bytes
so in total 650.880.000 Bytes = 620 MB.
v6 using IntegerArray
meassurment results
I applied v6 on r21923 and tested. After a large search&replace, undo
just undo only the first occurrence. It seems not working. I don't see
the cause so far. Does it work for you?
Besides that, I found some whitespace issues in the patch:
- Some added lines starts with a space followed by tabs. The spaces
should be removed.
- There are some unnecessary blank lines after opening brackets. I found
at getCompressedReplaceFromReplaceReplace() method, and class
CompressedReplace.
Please take care next time.
hi, sorry, some untested late changes in getCompressedReplaceFromReplaceReplace broke the undo. corrected patch attached.
v7
Accepted v7 as r21939. Thank you.