The reason this is consuming memory is that saxon:stream creates a tree in memory for each element selected by its argument expression, and the tree for the table_data element is rather large (about 1Gb). To solve this using saxon:stream, the argument expression needs to select down one extra level, to the row elements.

Unfortunately, though, saxon:stream creates copies of selected elements and the copies do not include any information about parent elements. So it's not possible to get attributes of parent nodes (../@name) with this approach.

In principle the problem can be solved using streaming templates, like this:

<xsl:mode streamable="yes"/>

<xsl:template match="/">
  <xsl:apply-templates select="*/*/table_data"/>
</xsl:template>

<xsl:template match="table_data">
  <xsl:apply-templates select="row">
    <xsl:with-param name="table-name" select="string(@name)"/>
    <xsl:with-param name="collection-uri" select="concat('db/',parent::*/@name,'/',@name)"/>
  </xsl:apply-templates>
</xsl:template> 

<xsl:template match="row">
  <xsl:param name="table-name"/>
  <xsl:param name="collection-uri"/>
  <xsl:variable name="record">
    <xsl:element name="{$table-name}">
      <xsl:for-each select="field[@name]">
        <xsl:element name="{if(number(substring(@name,1,1))=number(substring(@name,1,1))) then concat('_',@name) else @name}">
          <xsl:value-of select="text()"/>
        </xsl:element>
      </xsl:for-each>
    </xsl:element>
  </xsl:variable>
  <xsl:variable name="id" select="
    if($table-name='cal_cyc') then concat('id-',$record/*/id,'-mnth-',$record/*/mnth,'-wk-',$record/*/wk)
    else if($record/*/id) then concat('id-',$record/*/id)
    else concat('ndx-',position())"/>
  <xsl:result-document href="{$collection-uri}/{$id}.xml">
    <xsl:copy-of select="$record"/>
  </xsl:result-document>
</xsl:template>

The only problem is, this gives an NPE in Saxon 9.4 during the stylesheet streamability analysis.

I tracked this down to the use of select="field[@name]", and circumvented the problem by replacing this part of the code with

     <xsl:for-each select="field">
        <xsl:if test="exists(@name)">
          <xsl:element name="{if(number(substring(@name,1,1))=number(substring(@name,1,1))) then concat('_',@name) else @name}">
            <xsl:value-of select="text()"/>
          </xsl:element>
        </xsl:if> 
      </xsl:for-each>

This then worked on the development branch, but when I ran it on 9.4 it produced the error:

Error at xsl:element on line 28 of test2.xsl:
  SXST0060: Expression xsl:element has more than one subexpression that reads descendants

I haven't quite got to the bottom of this, but it seems to be caused by over-pessimistic analysis of the expression

if(number(substring(@name,1,1))=number(substring(@name,1,1))) then concat('_',@name) else @name

I replaced this with the simpler equivalent

replace(@name, '^[0-9]', '_$1')

and the streaming now appears to work:

Execution time: 44.478s (44478ms)
Memory used: 213599520
NamePool contents: 94 entries in 89 chains. 6 URIs

At 213Mb, memory consumption is still significantly high, but I think we can say confidently that the whole document is no longer being held in memory.

I will investigate further to fix the NPE, and to determine why the conditional expression above is not being correctly analysed. I've already done a patch that improves the error message for the "more than one subexpression" situation.

saxon:stream() will eventually be replaced by facilities within the scope of the W3C streaming spec. At the moment there are still some cases where it's needed, but for this particular case, moving to streamable templates is probably the right approach because it gives you access to attributes outside the elements selected by saxon;stream.

Michael Kay
Saxonica



On 16/03/2012 22:30, Todd Gochenour wrote:
A link the the large XML document and the XSL stylesheet were sent to your email directly.  The input document is just <stub/>.   

On Fri, Mar 16, 2012 at 3:11 PM, Michael Kay <mike@saxonica.com> wrote:


On 16/03/2012 21:06, Todd Gochenour wrote:
Michael, here's a response from oXygen support.  I was planning on performing this same test with standalone Saxon this weekend, but they beat me to the punch.
Thanks. Can you please supply the input file and stylesheet being tested?

Michael Kay
Saxonica
 
-------------------------------------------------------------------------------
Hello Todd,

We've tested with your XSLT from the Saxon mailing list and the XML you provided via Google Docs. Saxon-EE ran out of memory when running the transformation in the command line, completely disconnected from Oxygen.

I've also tried with a very simple XSLT, but had the same result:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:saxon="http://saxon.sf.net/">
   <xsl:template name="test">
       <xsl:for-each select="saxon:stream(doc('test7.xml')/*/*/table_data)">
           <xsl:text>!</xsl:text>
       </xsl:for-each>
   </xsl:template>
</xsl:stylesheet>


With a different input file(simpler, two levels deep) the problem does not occur and the streaming works fine with Saxon-EE both in the command line and in Oxygen.

In my opinion this means it's a Saxon issue and should be forwarded to Michael Kay.

Let me know if I can be of  further assistance.

Regards,
Adrian

Adrian Buza
oXygen XML Editor and Author Support



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help 

------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help



------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help