I am using saxon 7.6 and noticed some things, which eventually are interesting for the saxon developer(s).
Hopefully they are not outdated by the more current 7.8 version...
I found a dependency from the
created DocumentInfo to the parser and ContentEmitter classes,
which originally have been used to create the tree.
This dependency can be removed by making DefaultNodeFactory a distinct class instead of a contained one.
is declared inside TreeBuilder class and an instance
of DefaultNodeFactory is passed to
the DocumentInfo instance, TreeBuilder will no longer be garbage collected after creating the tree.
I do not exactly know why but this also prevents the parser and therefore the content handler (ContentEmitter)
from being garbage collected (actually it must have something to do with the document locator).
This is a problem, if there
are large text nodes (e.g. base64 encoded binary data) within the xml document,
because ContentEmitter contains a buffer, which is sized according to the maximum text node size and
this buffer is no longer needed after the tree has been created.
Shouldn't the TinyDocumentImpl condense function be called somewhere in endDocument of TinyTreeBuilder?
Why is the char buffer not condensed in TinyDocumentImpl? There is always the chance that only a few
more characters are needed after the character buffer has been enlarged by doubling its size. Result is
that the "tiny" tree is not so small as the name suggests especially if there is a lot of text content in the
Eventually the following
heuristic can be used to determine a reasonable start value for the character
instead of using 4000 fixed:
If system id is known find
out the file size. Reduce this size by some fixed percentage to count for
Is it possible to remove
"final" from the TinyDocumentImpl class?
I can think of a lot of possibilities for customisation