> I have a quick question I was investigating Saxon code to find out how the StaicQueryContext compiles XQuery (it seems complicatedJ).  
 
Yes, it's complicated! There are six main phases of processing:
 
* parsing, which constructs an abstract syntax tree (a tree of Expression objects)
 
* binding of variable and function references to their declarations
 
* simplification, which does some very simple context-independent rewrites
 
* type checking, which checks expressions against the type checking rules, decides the static type of each expression, and generates extra code (extra nodes in the expression tree) to perform run-time checks and conversions
 
* optimization, which does more complex rewrites such as moving expressions out of loops
 
* slot allocation: defining where variables will be stored on the local stackframe.
 
 
>Is it possible to find out, which elements (from the input XML) does the given XQuery is interested in as you suggested I need to strip down my XML based on some technique? 
 
This technique is sometimes called "document projection", and is described in a paper by Amelie Marian and Jerome Simeon at http://www-db.research.bell-labs.com/user/simeon/xml_projection.pdf. It looks a very promising technique. Most of the static analysis that it requires is already done by Saxon (at least in Saxon-SA), though not all. If you've got time, I suggest you study the paper and see how much of the required information can be obtained from the existing Saxon expression tree. (Alternatively, you could work from the parsed query in XQueryX form: but you would then have to perform a lot more of the analysis yourself.)
 
Michael Kay
http://www.saxonica.com/
 
 

 

Thanks,

Srinivas Kusunam

 


From: saxon-help-bounces@lists.sourceforge.net [mailto:saxon-help-bounces@lists.sourceforge.net] On Behalf Of Michael Kay
Sent: Friday, August 25, 2006 3:09 PM
To: 'Mailing list for SAXON XSLT queries'
Subject: Re: [saxon] Saxon Grouping Extension Function (Xquery)

 

My first thought was that you would be best off doing this using a sort, followed by group-adjacent. The group-adjacent functionality is available in XSLT, but not in XQuery, even with the saxon:for-each-group extension. In XQuery, you would have to implement the group-adjacent logic using a recursive scan, which imposes its own stresses with this kind of data volume. On reflection, however, I don't think the sort would use any less memory than for-each-group.

 

In XSLT, the logic is simply

 

<xsl:for-each-group select="Title" group-by="ModelYear">

             <distribution>

                <value><xsl:value-of select="current-grouping-key()"/></value>

                <count><xsl:value-of select="count(current-group())"/></count>

              </distribution>

</xsl:for-each-group>

 

and I would suggest you try that first.

 

First check that you can actually load the document into memory (e.g by running a query such as count(//*)). If that fails then you're going to have to do something to reduce its size by pre-filtering. If it does load into memory, then the above code adds a requirement to hold a hash table containing one entry for each key value, mapped to list containing object references to the nodes with that key. That's likely to be much smaller than the tree itself.

 

You can of course write the above using the XQuery saxon:for-each-group construct if you really want, but I'm not sure why you would want to: a standard XSLT solution seems better than a non-standard XQuery one.

 

(There's a fairly easy Saxon optimization I could implement to detect that the only thing you are doing with the group is to count its members: but before implementing an optimization, I have to ask how many people would benefit from it).

 

The performance of this should be fine so long as you don't run out of memory.

 

Michael Kay

http://www.saxonica.com/

 

*****************************************************************
This message has originated from RLPTechnologies,
26955 Northwestern Highway, Southfield, MI 48033.

RLPTechnologies sends various types of email
communications.  If this email message concerns the
potential licensing of an RLPT product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to Do_Not_Send@rlpt.com
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at
postmaster@rlpt.com.
*****************************************************************