If you're failing with an out-of-memory error on a 16Mb document, the most likely explanation is that you didn't allocate enough memory to the Java VM - use the -Xmx option on the Java command line.
 
The option BINDING_MODE_DEFERRED in XQJ is designed to give extra control when you are using an XQuery processor that is able to run in streaming mode. Saxon 9.0 does not have any support for streaming mode in XQuery, though it does in XSLT: see http://www.saxonica.com/documentation/sourcedocs/serial.html  However, processing a 16Mb document shouldn't need streaming.
 
There will in fact be some support for streaming in XQuery in the next Saxon release (Saxon-SA only), but I had overlooked the possibility of controlling it through this XQJ parameter. I'll take another look at it to see if this is possible.
 
Michael Kay
http://www.saxonica.com/
 
 


From: saxon-help-bounces@lists.sourceforge.net [mailto:saxon-help-bounces@lists.sourceforge.net] On Behalf Of Ryad Ben-El-Kezadri
Sent: 19 March 2008 13:01
To: saxon-help@lists.sourceforge.net
Subject: [saxon] Problem in parsing not so large XML files with Saxon/XQJ

Dear All,

I'm looking for fast APIs with XQuery support to process large XML files.
I've followed the instructions of the tutorial "XQJ Tutorial Part XI: Processing Large Inputs"
available at: http://www.xquery.com/tutorials/xqj_tutorial/processing-large-inputs.html
explaining how XQJ makes this possible thanks to BINDING_MODE_DEFERRED + bindDocument.

The problem is that Saxon 9 issus me an exception
at the bindDocument call
that is to say before the executeQuery call!

Here is the error stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at net.sf.saxon.tinytree.TinyTree.ensureAttributeCapacity(TinyTree.java:250)
at net.sf.saxon.tinytree.TinyTree.addAttribute(TinyTree.java:538)
at net.sf.saxon.tinytree.TinyBuilder.attribute(TinyBuilder.java:258)
at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:361)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:354)
at net.sf.saxon.event.Sender.send(Sender.java:164)
at net.sf.saxon.event.Sender.send(Sender.java:49)
at net.sf.saxon.Configuration.buildDocument(Configuration.java:2748)
at net.sf.saxon.xqj.SaxonXQDataFactory.createItemFromDocument(SaxonXQDataFactory.java:387)
at net.sf.saxon.xqj.SaxonXQDynamicContext.bindDocument(SaxonXQDynamicContext.java:66)
at app6.Main6.main(Main6.java:36)

In my source code, i'm just doing a xquery ("/trace/mac:state[@instant='913689']") that matches the last XML elements of a 16MB size XML file:
///////////////SOURCE CODE//////////////////////
String result = null;
XQDataSource ds = new SaxonXQDataSource();
XQConnection xqc = ds.getConnection();
           
XQStaticContext xqsc = xqc.getStaticContext();// make sure boundary-space policy is preserve
xqsc.declareNamespace("mac", "http://example.org/mac");
xqsc.setBindingMode(XQConstants.BINDING_MODE_DEFERRED);
xqc.setStaticContext(xqsc);// make the changes effective

XQPreparedExpression xqpe = xqc.prepareExpression("/trace/mac:state[@instant='913689']");           
             
SAXSource saxSource = new SAXSource(new InputSource("ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.xml"));
xqpe.bindDocument(XQConstants.CONTEXT_ITEM, saxSource, xqc.createDocumentType());
         
System.out.println("Never reach here!!!!!!");
           
XQResultSequence xqs = xqpe.executeQuery();

while (xqs.next()) {
    result = xqs.getItemAsString(null);
    System.out.println(result);
}
///////////////////see:http://rp.lip6.fr/~kezadri/xquery/Main6xqj.java for complete source code///////////////////////////////

The XML file basicaly looks like a flat tree:

//////////////XML FILE///////////
<?xml version='1.0'?>
<trace xmlns:mac='http://example.org/mac' xmlns:phy='http://example.org/phy' xmlns:mob='http://example.org/mob'>
    <mac:bo    instant='24' nid='5' evtType='2' pastslots='0' totalslots='0' cw='31'>START_BO</mac:bo>
    <mac:state instant='24' nid='5' duration='50' evtType='2' expired='1' ifs='1'>CS_DEFER</mac:state>
    ...
    ...
    <mac:state instant='913689' nid='29' duration='192' evtType='3' type='0'>CS_PHY</mac:state>
    <mac:state instant='913689' nid='30' duration='192' evtType='3' type='2'>CS_PHY</mac:state>
</trace>
///////////////////////////////////////////////////

The XML zipped file (800Ko) is available at:
http://rp.lip6.fr/~kezadri/xquery/ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.zip

The code also fails (again in Saxon's TinyTree) when I use a XMLStreamReader instead of a SAXSource:
///////////////////////
FileInputStream in = new FileInputStream("ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.xml");
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader sr = f.createXMLStreamReader(in);
////////////////////////

I report this behaviour of saxon because, from the tutorial & from a user perspective, we could except this code to work.
Furthemore as the XML elements are processed one after the other, we could except that only few XML elements are stored in memory
and so such an exception could be avoided.
This is the case when I use the "StreamingPathFilter" feature of the Nux high level API with a StaX (woodstox) or a SAX parser.
In this case, Saxon 8 handles my XML file with success.

To conclude, my first question is:
Does the XQJ exception comes from
a) my code
b) saxon code
c) neither a) nor b) : there is no problem (saxon-B simply don't do that) and there is nothing to do to solve that

Finally, may I have some clues to speed up my xquery processing with Saxon-B?

Thanks for your help,
Ryad

PS: I've put my xquery java files for peopled interested by examples at:

http://rp.lip6.fr/~kezadri/xquery/Main5NuxSax.java
http://rp.lip6.fr/~kezadri/xquery/Main5NuxStax.java
http://rp.lip6.fr/~kezadri/xquery/Main6xqj.java