My first thought was that you would be best off doing this using a sort, followed by group-adjacent. The group-adjacent functionality is available in XSLT, but not in XQuery, even with the saxon:for-each-group extension. In XQuery, you would have to implement the group-adjacent logic using a recursive scan, which imposes its own stresses with this kind of data volume. On reflection, however, I don't think the sort would use any less memory than for-each-group.
In XSLT, the logic is simply
<xsl:for-each-group select="Title" group-by="ModelYear">

                <value><xsl:value-of select="current-grouping-key()"/></value>

                <count><xsl:value-of select="count(current-group())"/></count>




and I would suggest you try that first.


First check that you can actually load the document into memory (e.g by running a query such as count(//*)). If that fails then you're going to have to do something to reduce its size by pre-filtering. If it does load into memory, then the above code adds a requirement to hold a hash table containing one entry for each key value, mapped to list containing object references to the nodes with that key. That's likely to be much smaller than the tree itself.


You can of course write the above using the XQuery saxon:for-each-group construct if you really want, but I'm not sure why you would want to: a standard XSLT solution seems better than a non-standard XQuery one.


(There's a fairly easy Saxon optimization I could implement to detect that the only thing you are doing with the group is to count its members: but before implementing an optimization, I have to ask how many people would benefit from it).


The performance of this should be fine so long as you don't run out of memory.


Michael Kay


From: [] On Behalf Of Kusunam, Srinivas
Sent: 25 August 2006 17:35
To: Mailing list for SAXON XSLT queries
Subject: [saxon] Saxon Grouping Extension Function (Xquery)

Hi Mike,


As you know I was asking about "Distinct Count" requirement in XQuery-Talk list. I would like to try it with Saxon-SA (last hope) and see if it helps me in performance. I want to apply this on an input XML file of size 317MB (5 times = 1.6GB memory)


Here is my original XQuery:


let $mdoc := doc('input.xml')/Body

let $sourModelYEAR := $mdoc/Title/ModelYear



    <Element name="ModelYear">



          for $dvalue in fn:distinct-values($sourModelYEAR)

          let $eachcount := count($mdoc/Title[ModelYear=$dvalue])



                <value>{ $dvalue }</value>

                <count>{ $eachcount }</count>







Here is the modified one with Grouping function:


declare function local:groups($seq as xs:string*, $s as xs:string, $count as xs:integer) {

   if (empty($seq))

   then <gp value="{$s}" count="{$count}"/>


     if ($seq[1] eq $s)

     then local:groups($seq[position() > 1], $s, $count+1)


 (<gp value="{$s}" count="{$count}"/>, local:groups($seq[position() > 1],  $seq[1], 1))



let $mdoc := doc('sampleXML.xml')/Body

let $sourModelYear := $mdoc/Title/Group/ModelYear



    <Element name="ModelYear">



          let $sortedYears := 

                         for $dvalue in $sourModelYear

                       order by $dvalue

                       return string($dvalue)


             local:groups($sortedYears[position()>1], $sortedYears[1], 1)






My Questions are:


1)       What should I change in my original XQuery for using your saxon:for-each-group extension function? How efficient is this extension function?


I was reading about this here:


2)       If I use my 2nd XQuery (modified) how efficient is Saxon in executing recursive functions? Because as I said with other products I am getting “StackOverflowError”.


Please let me know if you have any other suggestions for this.


            I appreciate your help and suggestions.





This message has originated from RLPTechnologies,
26955 Northwestern Highway, Southfield, MI 48033.

RLPTechnologies sends various types of email
communications.  If this email message concerns the
potential licensing of an RLPT product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at