>I would advise you to try StAX (XMLEventReader if I'm correct),
it's much more intuitive to use that SAX and still doesn't imply
building a tree.
You may have read something into the original description of the problem that I haven't - it's not clear to me from the original post to tell us that a streamed solution is possible. If there's only one evaluate() being done for each source document, then you would be right (though I personally find SAX easier to use than StAX).
 
Michael Kay


From: Shridhar Venkatraman [mailto:shridhar@neemtree.com]
Sent: 11 April 2009 15:41
To: Mailing list for the SAXON XSLT and XQuery processor
Subject: Re: [saxon] Speeding up getting attributes in Java

Hi,

Here the incoming data is dynamic (and the structure will get more complex from the DESET shown). Based on dynamically changing XML configurations, we need to extract arbitrary data from the these arbitrary incoming messages and do some arbitrary work on them.

We are trying to not create any data structures other than xml ones and just use Xpath and node walking to make this happen.

We need to make this CEP application coast above 1000 tps on one cpu, and will ratchet down to Stax and Java data structures if we cannot get there.

Regards
sV
-----------------------------

Nikita Zinoviev wrote:
If you're still not satisfied with Michael's approach,
I would advise you to try StAX (XMLEventReader if I'm correct),
it's much more intuitive to use that SAX and still doesn't imply 
building a tree.
Also in "event" mode when you found a startElement event of an 
interesting element,
you can access it's attributes in convenient way much like in case of 
tree node.

But to my mind it would be cool if you could improve speed say 5 times 
with Saxon
related tricks alone. It looks like a very nice highlevel approach would 
be great to have another proof that it is
also very effective!

Nikita Zinoviev

Michael Kay wrote:
  
There's no need to wrap the Saxon NodeInfo as a DOM Node just to 
discover an attribute value.
 
Try:
 
private String getNodeAttr(NodeInfo node, String attr) {
    return Navigator.getAttributeValue(node, "", attr)
}
 
If you're getting the same attribute repeatedly, then it would be even 
more efficient to convert the attribute name to an integer fingerprint 
once, and then use the fingerprint repeatedly via
 
node.getAttributeValue(fingerprint)
 
Alternatively, why not get the attribute value in your XPath code 
rather than in your Java code?
 
Michael Kay
http://www.saxonica.com/


    ------------------------------------------------------------------------
    *From:* Shridhar Venkatraman [mailto:shridhar@neemtree.com]
    *Sent:* 05 April 2009 08:34
    *To:* saxon-help@lists.sourceforge.net
    *Subject:* [saxon] Speeding up getting attributes in Java

    Hi,

    We are using Java to parse xml messages coming at us from a queue.

    The XML is simple in structure;

        <DESET  SOURCE="TRANSFORM" VERSION="01" TIMESTAMP="2008-10-16
        10:12:52.0" ERROR="0">
            <DE NAME = "TRANSACTIONID" TYPE = "LONG"   VALUE = "1"/>
            <DE NAME = "ACCTTYPE"      TYPE = "STRING" VALUE = "01"/>
            around 50 instances of 'DE'
        </DESET>


    We load it into a document this way;

          byte[] mybyte = txnstatsxml.trim().getBytes();
          java.io.ByteArrayInputStream bais = new
    java.io.ByteArrayInputStream(mybyte);
          InputSource is = new InputSource(bais);
          SAXSource ss = new SAXSource(is);
          doctxnstats = ((net.sf.saxon.xpath.XPathEvaluator)
    xpath).setSource(ss);

    We then use compiled expressions to search for nodes like so;

        n = (List) expr_DESET_DE.evaluate(doctxnstats,
        XPathConstants.NODESET);


    When we find the nodes we look for attributes by calling this
    helper method like so;

        private String getNodeAttr(NodeInfo node, String attr) {
            try {
                 return
        NodeOverNodeInfo.wrap(node).getAttributes().getNamedItem(attr).getNodeValue();
                } catch (Exception e) { }
            return null;
         }

    The above method is eating 30% of the cpu used by our message
    consumer (70% includes MQ interfaces, some SQL inserts  and
    related commits) according to the profiler. It is invoked very
    often. Is there is a way to speed this attribute 'getting'? These
    messages come at us over 500 a sec.

    Regards
    sV
    ------------------------------------------------------------------------

------------------------------------------------------------------------

------------------------------------------------------------------------------
  
------------------------------------------------------------------------

_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help 
  
    


------------------------------------------------------------------------------
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help