From: firstname.lastname@example.org [mailto:email@example.com] On Behalf Of Rucker, Karsten
Sent: 19 December 2003 05:17
Subject: [saxon] Standard Tree, Tiny Tree
I am using saxon 7.6 and noticed some things, which eventually are interesting for the saxon developer(s).
Hopefully they are not outdated by the more current 7.8 version...
I found a dependency from the created DocumentInfo to the parser and ContentEmitter classes,
which originally have been used to create the tree.
This dependency can be removed by making DefaultNodeFactory a distinct class instead of a contained one.
Since DefaultNodeFactory is declared inside TreeBuilder class and an instance of DefaultNodeFactory is passed to
the DocumentInfo instance, TreeBuilder will no longer be garbage collected after creating the tree.
I do not exactly know why but this also prevents the parser and therefore the content handler (ContentEmitter)
from being garbage collected (actually it must have something to do with the document locator).
This is a problem, if there are large text nodes (e.g. base64 encoded binary data) within the xml document,
because ContentEmitter contains a buffer, which is sized according to the maximum text node size and
this buffer is no longer needed after the tree has been created.
Thanks for this analysis. The problems are easily fixed once identified! Firstly, the DefaultNodeFactory should be a static inner class, as it does not make any use of its containing class. Secondly, there is no need for the DocumentImpl to retain a reference to the node factory. It's used only by the code that expands a simplified stylesheet (in LiteralResultElement#makeStylesheet), and in the two places this is called, the StyleNodeFactory is already available in the calling routine so it can be passed as a parameter.
Shouldn't the TinyDocumentImpl condense function be called somewhere in endDocument of TinyTreeBuilder?
Yes, I don't know why this isn't done, and will add a call. Currently condense() is called only for temporary trees.
Why is the char buffer not condensed in TinyDocumentImpl? There is always the chance that only a few
more characters are needed after the character buffer has been enlarged by doubling its size. Result is
that the "tiny" tree is not so small as the name suggests especially if there is a lot of text content in the
An oversight, I will add this.
Eventually the following heuristic can be used to determine a reasonable start value for the character buffer size
instead of using 4000 fixed:
If system id is known find out the file size. Reduce this size by some fixed percentage to count for element, attribute
I did actually have something like this in at one time (hence the methods allowing the caller to specify the initial size parameters), but I think it got too difficult to pass it through all the layers.
Is it possible to remove "final" from the TinyDocumentImpl class? I can think of a lot of possibilities for customisation
It's very hard to come up with a consistent policy on this. My general rule has been that a class should either be designed for subclassing, in which case it should provide getters and setters for all its private data, and validate all calls on methods, or it should prevent subclassing by being final. But I haven't followed this consistently. Certainly anyone who subclasses TinyDocumentImpl can do untold damage, and in those circumstances, I don't think it's too bad a thing if they have to modify the "final" declaration so they indicate that they know what they are doing.