Since Saxon 8.5.1 the Configuration object has maintained a pool of XML parsers, to reduce the costs associated with initializing a new parser each time a document is parsed: the initialization cost is very high in relation to the cost of parsing a small document. In fact there are two such pools, one for "source parsers" and one for "style parsers" corresponding to the -x and -y options on the command line. (The "style parser" is actually used both for XSLT stylesheets and schema documents.).
The problem with this approach is that the parser retains a reference to its ContentHandler, which in the Saxon case contains a link to a Builder, which in turn contains a reference to the document that was built, and this means that a constructed document is locked into memory for such time as its parser remains in the pool (or is reused to build another document). In adverse circumstances this can lead to a large document remaining in memory long after it is needed, especially as the pool is held at the Configuration level.
In Saxon 8.9.1 an attempt was made to solve this problem by setting the parser's ContentHandler and other hooks to null at the time the parser is returned to the pool after use. However, this doesn't entirely work. Firstly, the code was added for the source parser pool but not for the style parser pool. Secondly, and more significantly, the code fails to reset the LexicalHandler to null; and with some SAX parsers such as Xerces it is impossible to reset the LexicalHandler to null, because the code in Xerces tests "value instanceof LexicalHandler" and throws an exception if the test is false.
I am therefore making a new attempt to solve the problem by breaking the link between the Builder and the constructed tree. This is done by creating a new method builder.reset() which is called after the call on builder.getCurrentNode(), and which has the effect of setting the currentNode in the builder to null. In the source clearance (for Saxon 9.1) builder.reset() will be called on all relevant paths; in the patch for Saxon 9.0 (to be placed in Subversion) it will be called only on the two or three most important paths.
A workaround to the problem is to avoid using the parser pool. If the application consistently supplies input documents using a SAXSource whose XMLReader is initialized to a user-created parser, then the parser pool will never be populated. Note that to do this consistently, you will need to use a user-written URIResolver.
Logged In: YES
user_id=251681
Originator: YES
Fixed in 9.0.0.5