#62 CharSequence StreamedSource should not require finalization

open-fixed
Code (51)
5
2012-12-27
2012-12-20
Soul-Burn
No

The StreamedSource class declares a non-trivial finalizer, causing it to be handled specially by the garbage collector.
Since the StreamedSource object eventually holds references to the original Source (through a series of objects), this can point to very large and complex data structures. A copying GC (such as HotSpot) then has to copy these data structures, causing extreme slowdowns when parsing many documents. On my test case, the difference was between 5 second collections and under 100ms of collection.

Using the finalizer makes sense for real streamed sources but makes no sense when the input is text. This occurs, for example, when calling getAttributeValue on an attribute that includes a '&' character.

My proposed solution is as follows:
Remove the finalizer from StreamedSource.
Create an object, with a finalizer, which would be created only in the autoclose case (replacing that variable maybe).
StreamedSource will have an optional reference this object.
This object will hold a reference to StreamedSource.
This object will call what the original finalizer called, i.e "automatic close", or even just the close method.

This way, cases which don't require finalization, such as when using a text input, won't cause unneeded gc handling.

Discussion

  • Martin Jericho
    Martin Jericho
    2012-12-26

    Hi soul-burn,

    Thanks for your bug report and sorry for the delayed response, I was on holidays!

    I'll implement your suggested fix within the next day or so. I wasn't aware that non-trivial finalizers had such a performace impact so thanks for making me aware of it.

    Regards
    Martin

     
  • Martin Jericho
    Martin Jericho
    2012-12-26

    • status: open --> open-accepted
     
  • Martin Jericho
    Martin Jericho
    2012-12-27

    • status: open-accepted --> open-fixed