Can’t see any obvious reason for this: we would need to look at the total picture. For example, how is $all declared and is this the only reference to it? It might work better if $all is a sequence of elements rather than a single document node, as it might then be possible to avoid materialising the entire variable in memory. If $all were a sequence of elements and this were the only reference, the variable would be inlined and creation of the variable contents would run in parallel with the grouping.
It would certainly be interesting to study what’s happening here, but for that we would need a repro.
(Remembering your cemetery report from years ago which gave a lot of useful insights into Saxon performance and resulted in quite a few internal optimizations, until it became so fast that it was no longer an interesting test case...)
Michael Kay
Sent: Friday, July 25, 2014 5:02 PM
Subject: [saxon] chunking output speed
I am assembling a pretty large file (RDF/XML, as it happens), of a quite few
hundred megabytes, and my consumer asked if I could split it up into bits. So I saved all the output in a variable and did
<xsl:for-each-group select="$all/*"

                        group-starting-with="*[(position() mod 100000) =    0]">

      <xsl:variable name="F">

        <xsl:text>out -</xsl:text>

        <xsl:value-of select="position()"/>



          <xsl:result-document href="{$F}">

              <xsl:copy-of select="current-group()"/>



to write 100000 records at a time into output files.
Which seem innocuous enough. But it takes an inordinately long time to run. I get number 1 quickly, number 2 after a while and it gets slower and slower...
I am being dense? is there a more efficient formulation?
Yes of course I could use an external splitting script, but where's the fun
in that?

Sebastian Rahtz     

Director (Research) of Academic IT

University of Oxford IT Services

13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431

Não sou nada.

Nunca serei nada.

Não posso querer ser nada.

À parte isso, tenho em mim todos os sonhos do mundo.

Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.

saxon-help mailing list archived at