The StreamedSource class declares a non-trivial finalizer, causing it to be handled specially by the garbage collector.
Since the StreamedSource object eventually holds references to the original Source (through a series of objects), this can point to very large and complex data structures. A copying GC (such as HotSpot) then has to copy these data structures, causing extreme slowdowns when parsing many documents. On my test case, the difference was between 5 second collections and under 100ms of collection.
Using the finalizer makes sense for real streamed sources but makes no sense when the input is text. This occurs, for example, when calling getAttributeValue on an attribute that includes a '&' character.
My proposed solution is as follows:
Remove the finalizer from StreamedSource.
Create an object, with a finalizer, which would be created only in the autoclose case (replacing that variable maybe).
StreamedSource will have an optional reference this object.
This object will hold a reference to StreamedSource.
This object will call what the original finalizer called, i.e "automatic close", or even just the close method.
This way, cases which don't require finalization, such as when using a text input, won't cause unneeded gc handling.