I'm trying to do a simple transform on an XML file to convert it to CSV. Unfortunately, the input XML file is very large and leading to an OutOfMemoryError in saxon.

Can anything be done?

I have heard vague reports of a future "online mode" XSL spec.

Short of that, is there anything that can be done to make Saxon run in "online" mode when the conditions are just right?

Obviously certain transforms cannot be done in online fashion (e.g., Xpath queries spanning the entire document). But some can.

I gues the first question would be: is Saxon required to convert the input to an in-memory DOM due to the fact that it subcontracts out the parsing to a separate XML parser? Or can Saxon be hooked up to a streaming parser (e.g., StAX) and told to just throw an error if it ever needs to look backwards?

In any case, here's the stylesheet:

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

    <xsl:output method="text" media-type="text/plain"/>

    <xsl:template match="/featureMap">
        <xsl:value-of select="'RT,MZ,Charge,Intensity,Quality&#10;'"/>
        <xsl:apply-templates/>
    </xsl:template>

    <xsl:template match="/featureMap/featureList/feature">
        <xsl:value-of select="concat(position[@dim = '0'], ',', position[@dim = '1'], ',', charge, '&#10;')"/>
    </xsl:template>

    <xsl:template match="node()|@*">
        <xsl:apply-templates/>
    </xsl:template>

</xsl:transform>

Theoretically, there's no reason this stylesheet couldn't operate in an online fashion. One can imagine the naive XSL algorithm would simply recurse through the the input document, matching each node to the stylesheet, so presumably it's not the XSL processor's fault that we run out of memory... is it?

It has always struck me that the fact that the XSL spec did not make an explicit accommodation for online processing is a design mistake (just my opinion).

Thanks,
-Archie

--
Archie L. Cobbs