Hi Andy,
 
This is not exactly the answer to your question, but according to my experience with a 10 MB XML both DOM and JDOM should be a major bottleneck, that is need much memory and they consume CPU in terms of garbage collection overhead.
 
When we want to keep large XML in memory we usually keep it as byte arrays of the gzipped InputStream's this representation is compact (at least smaller than the original Stream, often much smaller) and reparsing this into SAXEvents is cheap in terms of CPU time (we currently use the tiny fast piccolo parser and avoid validation for these internal XMLs). BTW on moth real world PCs parsing a gzipped XML to SAX is much faster than parsing the raw XML, i.e. the overhead for decompression is outweight by the saved hard disk access. Details of course depend on CPU Power and Harddrive speed.
 
SAX-events feed into Saxon as a SAXSource will Saxon cause to build its (somewhat more efficient) internal tree structure. But tree structures need storage in any case. If you can do any of your processing of the XML input in a preprocessing step on the SAXEvents (i.e. remove parts of the input) this can be done extremely efficient with the XMLFilter API.
 
In our applications we didn't find a measurable performance gain by moving the above architecture to the SAXON internal representation. But this will depend on the details.
 
Well, maybe these of our "business secrets" are helpful for you.
 
Best regards,
Frank
-----Original Message-----
From: Andy Malakov [mailto:amalakov@ptc.com]
Sent: Tuesday, April 29, 2003 11:51 PM
To: saxon-help@lists.sourceforge.net
Subject: [saxon] How to create Saxon representation of XML tree

Hello All,
 
To boost XSLT performance I want to generate transformer-friendly XML tree. Is there any API or example how to generate Saxon internal tree?
 
(I have a module that produces XML in any format: XML text stream/DOM/JDOM. XML is than translated into HTML via XSLT. For fairly large XML streams (10Mb) XSTL becomes a huge performance bottleneck. I would like to change output format to Saxon-specific memory structures if it will help to boost performance).
 
Thank you in advance,
Andy Malakov