What you refer to as "online" processing is referred to in the XSLT 3.0 work and in Saxon as "streaming". There's a lot of support in Saxon-EE for streaming already, and more on the way, as it's a major feature of XSLT 3.0.

Your stylesheet is almost pure streamable, the exception being the instruction

<xsl:value-of select="concat(position[@dim = '0'], ',', position[@dim = '1'], ',', charge, '&#10;')"/>

where the processor can't automatically determine that the three downward selections are done in document order. If they are actually in document order, you could get around this by processing each element as it is encountered:

<xsl:template match="position[@dim = '0']">...</xsl:template>
<xsl:template match="position[@dim = '1']">...</xsl:template>
<xsl:template match="charge">...</xsl:template>

If they are not actually in document order, you could make a copy of the feature element and process that:

<xsl:template match="/featureMap/featureList/feature">
        <xsl:value-of select="copy-of(.)/concat(position[@dim = '0'], ',', position[@dim = '1'], ',', charge, '&#10;')"/>
    </xsl:template>

The memory needed is then limited to the size of the feature element.

To make this work you need to add <xsl:mode streamable="yes"> to the stylesheet, and you need to run it under Saxon-EE.

Michael Kay
Saxonica


On 09/01/2013 16:13, Archie Cobbs wrote:
I'm trying to do a simple transform on an XML file to convert it to CSV. Unfortunately, the input XML file is very large and leading to an OutOfMemoryError in saxon.

Can anything be done?

I have heard vague reports of a future "online mode" XSL spec.

Short of that, is there anything that can be done to make Saxon run in "online" mode when the conditions are just right?

Obviously certain transforms cannot be done in online fashion (e.g., Xpath queries spanning the entire document). But some can.

I gues the first question would be: is Saxon required to convert the input to an in-memory DOM due to the fact that it subcontracts out the parsing to a separate XML parser? Or can Saxon be hooked up to a streaming parser (e.g., StAX) and told to just throw an error if it ever needs to look backwards?

In any case, here's the stylesheet:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output method="text" media-type="text/plain"/>

    <xsl:template match="/featureMap">
        <xsl:value-of select="'RT,MZ,Charge,Intensity,Quality&#10;'"/>
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="/featureMap/featureList/feature">
        <xsl:value-of select="concat(position[@dim = '0'], ',', position[@dim = '1'], ',', charge, '&#10;')"/>
    </xsl:template>

    <xsl:template match="node()|@*">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

Theoretically, there's no reason this stylesheet couldn't operate in an online fashion. One can imagine the naive XSL algorithm would simply recurse through the the input document, matching each node to the stylesheet, so presumably it's not the XSL processor's fault that we run out of memory... is it?

It has always struck me that the fact that the XSL spec did not make an explicit accommodation for online processing is a design mistake (just my opinion).

Thanks,
-Archie

--
Archie L. Cobbs


------------------------------------------------------------------------------
Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
and much more. Keep your Java skills current with LearnJavaNow -
200+ hours of step-by-step video tutorials by Java experts.
SALE $49.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122612 


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help