Mike,

 

            Without any optimizations and with my recursive XQuery I was able to finish the processing with in a minute, this is amazing. Good job Mike. I was running it on Saxon-B, Linux OS with 32 bit limitation, JDK-1.5. Anyhow last Friday I have recommended Saxon-SA to my company and as I said we are still in investigation stage to see if XQuery can solve our requirements.

 

I am going to do more testing as to see how many elements (similar to my Model Year) it can process before going out of memory and other requirements we have here and also things you have suggested in your last mail.

 

            I have a quick question I was investigating Saxon code to find out how the StaicQueryContext compiles XQuery (it seems complicatedJ). Is it possible to find out, which elements (from the input XML) does the given XQuery is interested in as you suggested I need to strip down my XML based on some technique?

 

Thanks,

Srinivas Kusunam

 


From: saxon-help-bounces@lists.sourceforge.net [mailto:saxon-help-bounces@lists.sourceforge.net] On Behalf Of Michael Kay
Sent: Friday, August 25, 2006 3:09 PM
To: 'Mailing list for SAXON XSLT queries'
Subject: Re: [saxon] Saxon Grouping Extension Function (Xquery)

 

My first thought was that you would be best off doing this using a sort, followed by group-adjacent. The group-adjacent functionality is available in XSLT, but not in XQuery, even with the saxon:for-each-group extension. In XQuery, you would have to implement the group-adjacent logic using a recursive scan, which imposes its own stresses with this kind of data volume. On reflection, however, I don't think the sort would use any less memory than for-each-group.

 

In XSLT, the logic is simply

 

<xsl:for-each-group select="Title" group-by="ModelYear">

             <distribution>

                <value><xsl:value-of select="current-grouping-key()"/></value>

                <count><xsl:value-of select="count(current-group())"/></count>

              </distribution>

</xsl:for-each-group>

 

and I would suggest you try that first.

 

First check that you can actually load the document into memory (e.g by running a query such as count(//*)). If that fails then you're going to have to do something to reduce its size by pre-filtering. If it does load into memory, then the above code adds a requirement to hold a hash table containing one entry for each key value, mapped to list containing object references to the nodes with that key. That's likely to be much smaller than the tree itself.

 

You can of course write the above using the XQuery saxon:for-each-group construct if you really want, but I'm not sure why you would want to: a standard XSLT solution seems better than a non-standard XQuery one.

 

(There's a fairly easy Saxon optimization I could implement to detect that the only thing you are doing with the group is to count its members: but before implementing an optimization, I have to ask how many people would benefit from it).

 

The performance of this should be fine so long as you don't run out of memory.

 

Michael Kay

http://www.saxonica.com/

 

*****************************************************************
This message has originated from RLPTechnologies,
26955 Northwestern Highway, Southfield, MI 48033.

RLPTechnologies sends various types of email
communications.  If this email message concerns the
potential licensing of an RLPT product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to Do_Not_Send@rlpt.com
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at
postmaster@rlpt.com.
*****************************************************************