I am using saxon 7.6 and noticed some things, which eventually are interesting for the saxon developer(s).
Hopefully they are not outdated by the more current 7.8 version...



Standard tree:


I found a dependency from the created DocumentInfo to the parser and ContentEmitter classes,
which originally have been used to create the tree.
This dependency can be removed by making DefaultNodeFactory a distinct class instead of a contained one.

Since DefaultNodeFactory is declared inside TreeBuilder class and an instance of DefaultNodeFactory is passed to
the DocumentInfo instance, TreeBuilder will no longer be garbage collected after creating the tree.
I do not exactly know why but this also prevents the parser and therefore the content handler (ContentEmitter)
from being garbage collected (actually it must have something to do with the document locator).


This is a problem, if there are large text nodes (e.g. base64 encoded binary data) within the xml document,
ContentEmitter contains a buffer, which is sized according to the maximum text node size and
this buffer is no longer needed after the tree has been created.


Tiny tree:


Shouldn't the TinyDocumentImpl condense function be called somewhere in endDocument of TinyTreeBuilder?

Why is the char buffer not condensed in TinyDocumentImpl? There is always the chance that only a few
more characters are needed after the character buffer has been enlarged by doubling its size. Result is
that the "tiny" tree is not so small as the name suggests especially if there is a lot of text content in the
xml document.


Eventually the following heuristic can be used to determine a reasonable start value for the character buffer size
instead of using 4000 fixed:

If system id is known find out the file size. Reduce this size by some fixed percentage to count for element, attribute


Is it possible to remove "final" from the TinyDocumentImpl class? I can think of a lot of possibilities for customisation
this class.



Best regards