Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo


#426 Deduplicate Strings in UndoManager for big search&replace

Thomas Meyer

The UndoManager.contentRemoved() method is called by the JEditBuffer.remove() method and is supplied with a new string object for every call. In large search&replace-all scenarios this can quickly fill up memory as a reference is hold by the created CompoundEdit object, so these objects cannot get garbage collected. Use the KillRing, that is called anyway, to check for the existence of an equal String object. Sadly this doesn't work for whitespace replaces as the KillRing doesn't seem to store these.
For the test file in patch 3527797 (file: jedit-slow-search-replace-small.txt.zip) and a search-and-replace-all from "," to ", " this saves about 72MB of String objects for this simple search&replace-all case.


  • Thomas Meyer
    Thomas Meyer

    New patch against 3531538

  • Would you please give measures of this patch and unconditional use of
    String.intern() ? If their difference is not so much, I think the
    latter will be preferable in favor of less code complexity.

  • Thomas Meyer
    Thomas Meyer

    Hi k_satoda,

    I'm not sure if this is the right thing todo. Example:
    1.) Given a long running jEdit session, days or month, the string literal pool will get filled up with every search and replace action. These literal will still exists after the corresponding buffer is closed and will never get deleted.
    2.) Especially when you use as a search argument a regex and as a replacement a beanshell script, every replacement string may be different. this will quickly fill up the string literal pool, which will never get purged.

    So I think to only internal white spaces strings is a good compromise.

  • Did you miss a recent thread on jedit-devel?
    In short, a good implementation of String.intern() hold the String in
    pool as weak references and the instances will be collected as same as
    all other objects, and JDK provides such implementation.

    So I think it worth a measure.

  • Thomas Meyer
    Thomas Meyer

    Yes, okay. I didn't know that. Okay. updated patch attached. The saving still exists with String.intern() for a file with 2.260.000 changes from "," to ", ".

  • Thomas Meyer
    Thomas Meyer

    results before and after patch

    • assigned_to: nobody --> k_satoda
  • Could you please measure the time, too?

    Some articles about String.intern() says it can be a CPU bottle neck. If
    it is the case, we should try UniquePool (attached in the below
    mentioned thread), or using KillRing as you originally proposed.

    For the patch, is the move of getLastEdit() relevant to this issue? If
    it is, please explain what was wrong. If not, please drop from the
    patch. If it is just a code clean up without changing behavior, I'll
    commit it in a separate revision before or after the patch.

  • Thomas Meyer
    Thomas Meyer

    1.) Mesurment results:
    with patch:
    18:14:15 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 1 Diff= 22093582292 Avg=22093582292
    18:17:54 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 2 Diff= 18648002863 Avg=20370792577
    18:18:24 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 3 Diff= 17526573108 Avg=19422719421

    without patch:
    18:21:44 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 1 Diff= 22909061134 Avg=22909061134
    18:22:57 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 3 Diff= 14757049554 Avg=17632463808
    18:22:12 [AWT-EventQueue-0] [debug] SearchAndReplace: Call= 2 Diff= 15231280738 Avg=19070170936

    2.) getLastEdit()
    no, these are just code clean ups, as getLastEdit() is only needed when clearDirty is set.

    • summary: Use KillRing to deduplicate objects in UndoManager --> Deduplicate Strings in UndoManager for big search&replace
    • status: open --> closed-accepted
  • Accepted as r21884.

    The code cleanup part was accepted as r21883.