is not exactly the answer to your question, but according to my
experience with a 10 MB XML both DOM and JDOM should be a major bottleneck,
that is need much memory and they consume CPU in terms of garbage collection
we want to keep large XML in memory we usually keep it as byte arrays of the
gzipped InputStream's this representation is compact (at least smaller than the
original Stream, often much smaller) and reparsing this into SAXEvents is cheap
in terms of CPU time (we currently use the tiny fast piccolo parser and avoid
validation for these internal XMLs). BTW on moth real world PCs parsing a
gzipped XML to SAX is much faster than parsing the raw XML, i.e. the overhead
for decompression is outweight by the saved hard disk access. Details of course
depend on CPU Power and Harddrive speed.
SAX-events feed into Saxon as a SAXSource will Saxon cause to build its
(somewhat more efficient) internal tree structure. But tree structures need
storage in any case. If you can do any of your processing of the XML input
in a preprocessing step on the SAXEvents (i.e. remove parts of the input) this
can be done extremely efficient with the XMLFilter API.
applications we didn't find a measurable performance gain by
moving the above architecture to the SAXON internal representation. But
this will depend on the details.
maybe these of our "business secrets" are helpful for you.
To boost XSLT performance I want to generate
transformer-friendly XML tree. Is there any API or example how to generate
Saxon internal tree?
(I have a module that produces XML in any format:
XML text stream/DOM/JDOM. XML is than translated into HTML via XSLT. For
fairly large XML streams (10Mb) XSTL becomes a huge performance bottleneck. I
would like to change output format to Saxon-specific memory structures if it
will help to boost performance).
Thank you in advance,