Dear All,

I'm looking for fast APIs with XQuery support to process large XML files.
I've followed the instructions of the tutorial "XQJ Tutorial Part XI: Processing Large Inputs"
available at: http://www.xquery.com/tutorials/xqj_tutorial/processing-large-inputs.html
explaining how XQJ makes this possible thanks to BINDING_MODE_DEFERRED + bindDocument.

The problem is that Saxon 9 issus me an exception
at the bindDocument call
that is to say before the executeQuery call!

Here is the error stack trace:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at net.sf.saxon.tinytree.TinyTree.ensureAttributeCapacity(TinyTree.java:250)
at net.sf.saxon.tinytree.TinyTree.addAttribute(TinyTree.java:538)
at net.sf.saxon.tinytree.TinyBuilder.attribute(TinyBuilder.java:258)
at net.sf.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:361)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:501)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2740)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:645)
at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:140)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:508)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:807)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:107)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:354)
at net.sf.saxon.event.Sender.send(Sender.java:164)
at net.sf.saxon.event.Sender.send(Sender.java:49)
at net.sf.saxon.Configuration.buildDocument(Configuration.java:2748)
at net.sf.saxon.xqj.SaxonXQDataFactory.createItemFromDocument(SaxonXQDataFactory.java:387)
at net.sf.saxon.xqj.SaxonXQDynamicContext.bindDocument(SaxonXQDynamicContext.java:66)
at app6.Main6.main(Main6.java:36)

In my source code, i'm just doing a xquery ("/trace/mac:state[@instant='913689']") that matches the last XML elements of a 16MB size XML file:
///////////////SOURCE CODE//////////////////////
String result = null;
XQDataSource ds = new SaxonXQDataSource();
XQConnection xqc = ds.getConnection();
           
XQStaticContext xqsc = xqc.getStaticContext();// make sure boundary-space policy is preserve
xqsc.declareNamespace("mac", "http://example.org/mac");
xqsc.setBindingMode(XQConstants.BINDING_MODE_DEFERRED);
xqc.setStaticContext(xqsc);// make the changes effective

XQPreparedExpression xqpe = xqc.prepareExpression("/trace/mac:state[@instant='913689']");           
             
SAXSource saxSource = new SAXSource(new InputSource("ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.xml"));
xqpe.bindDocument(XQConstants.CONTEXT_ITEM, saxSource, xqc.createDocumentType());
         
System.out.println("Never reach here!!!!!!");
           
XQResultSequence xqs = xqpe.executeQuery();

while (xqs.next()) {
    result = xqs.getItemAsString(null);
    System.out.println(result);
}
///////////////////see:http://rp.lip6.fr/~kezadri/xquery/Main6xqj.java for complete source code///////////////////////////////

The XML file basicaly looks like a flat tree:

//////////////XML FILE///////////
<?xml version='1.0'?>
<trace xmlns:mac='http://example.org/mac' xmlns:phy='http://example.org/phy' xmlns:mob='http://example.org/mob'>
    <mac:bo    instant='24' nid='5' evtType='2' pastslots='0' totalslots='0' cw='31'>START_BO</mac:bo>
    <mac:state instant='24' nid='5' duration='50' evtType='2' expired='1' ifs='1'>CS_DEFER</mac:state>
    ...
    ...
    <mac:state instant='913689' nid='29' duration='192' evtType='3' type='0'>CS_PHY</mac:state>
    <mac:state instant='913689' nid='30' duration='192' evtType='3' type='2'>CS_PHY</mac:state>
</trace>
///////////////////////////////////////////////////

The XML zipped file (800Ko) is available at:
http://rp.lip6.fr/~kezadri/xquery/ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.zip

The code also fails (again in Saxon's TinyTree) when I use a XMLStreamReader instead of a SAXSource:
///////////////////////
FileInputStream in = new FileInputStream("ns2.30-s1-N30-A1-debXML0-durXML1-db-P50-p0.1-deb0.0-dur1-RTSaa0-nr1000Mb-phyGenericCard-.xml");
XMLInputFactory f = XMLInputFactory.newInstance();
XMLStreamReader sr = f.createXMLStreamReader(in);
////////////////////////

I report this behaviour of saxon because, from the tutorial & from a user perspective, we could except this code to work.
Furthemore as the XML elements are processed one after the other, we could except that only few XML elements are stored in memory
and so such an exception could be avoided.
This is the case when I use the "StreamingPathFilter" feature of the Nux high level API with a StaX (woodstox) or a SAX parser.
In this case, Saxon 8 handles my XML file with success.

To conclude, my first question is:
Does the XQJ exception comes from
a) my code
b) saxon code
c) neither a) nor b) : there is no problem (saxon-B simply don't do that) and there is nothing to do to solve that

Finally, may I have some clues to speed up my xquery processing with Saxon-B?

Thanks for your help,
Ryad

PS: I've put my xquery java files for peopled interested by examples at:

http://rp.lip6.fr/~kezadri/xquery/Main5NuxSax.java
http://rp.lip6.fr/~kezadri/xquery/Main5NuxStax.java
http://rp.lip6.fr/~kezadri/xquery/Main6xqj.java