#454 Performance parsing many documents

Michael Kay

There is a performance regression between Saxon 8.5.1
and Saxon 8.6.1 that shows up in a run where many small
documents are parsed. Saxon 8.5.1 introduced a
substantial improvement in this area by caching the XML
parser and reusing it. This improvement was largely
lost in 8.6.1. This was caused by correction of a bug:
in 8.5.1, when a parser was reused, its ErrorHandler
was also reused, which could cause error output to be
written to the wrong destination. In 8.6.1 a new
ErrorHandler is created each time the parser is reused.
It turns out that creating a StandardErrorHandler is an
expensive operation because it creates a PrintWriter
eagerly (which is almost never used).

To fix the problem, in
net.sf.saxon.StandardErrorHandler, (a) change the
declaration of "private Writer errorOutput" so that it
has no initial value, and (b) in method reportError,
where the errorOutput is used, change the code to read

if (errorOutput == null) {
errorOutput = new
String errcat = (isFatal ? "Fatal
error" : "Error");
errorOutput.write(errcat + " reported
by XML parser: " + e.getMessage() + '\n');
errorOutput.write(" URL: " +
e.getSystemId() + '\n');
errorOutput.write(" Line: " +
e.getLineNumber() + '\n');
errorOutput.write(" Column: " +
e.getColumnNumber() + '\n');

This change can double the throughput of a workload
that is dominated by parsing of small documents.

Michael Kay


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks