Sure, Michael, I'll get right on that.
Just a quick question: there isn't anything really simple and basic I'm missing about this, right? I don't need to put "Streamable = 'yes'" on anything or otherwise utilize "modes" on templates to use the saxon:stream() command, right?

My understanding is that streaming templates are a separate beast from the use of saxon:read-once and saxon:stream. I just want to make sure of that before wasting your time.

Thanks as always!
-David



On Mon, May 13, 2013 at 7:16 PM, Michael Kay <mike@saxonica.com> wrote:

It looks like you're not processing the input in streaming mode. If you get a "building tree" message for a file, that means it's not streaming.

I don't feel I have a complete picture of what you are doing here, so it's hard to see where it's going wrong. Could you try to provide a complete explanation starting at the beginning: what source files and stylesheets are there, what does  your Java application do? If necessary bundle it all together and send it to us - we'd rather have too much information than not enough.

Michael Kay
Saxonica


On 13 May 2013, at 13:14, David Rudel wrote:

I decided to RTFM and noticed that [timing = "true"] is equivalent to -t.

Strangely, the both gave similar outputs. The only difference I could see was that the "saxon:stream" version sometimes took a long time toward the end presumably because it was running out of RAM.

Here is an example of the output:

Tree built in 10 milliseconds
Tree size: 4120 nodes, 25 characters, 11376 attributes
Writing to file:/C:/Users/Drudel/Desktop/SVN_Testing/bryan_elementary_school_pro
cessed_05_10_13/SY_2012-2013/Grade_2/EFuentes_600239_1st_factHistory.xml
Building tree for file:/C:/Users/Drudel/Desktop/SVN_Testing/bryan_elementary_sch
ool_processed_05_10_13/SY_2012-2013/Grade_2/EGarcia_599999_1st_S.xml using class
 net.sf.saxon.tree.tiny.TinyBuilder



It looks like neither version is actually streaming the document because the "<stint>" element has only a few attributes, not 11376.  Here is a sample of the top of a typical file:

<student user.name="XXXXYYYYY">
<stint type="Original" SY="2012" grade="2" operation="A" fact.extractor.file="ZZZ-XXX.xml" start.date="2012-09-05" end.date="2013-05-08">
<session assignment="Adding and Subtracting  0 - 10">
....
</stint>
</student>

There is only one <stint> element in the entire file, and it is the first child of the <student> element. So I don't see why the processor is building the whole tree when all I'm asking for are the attributes.


-David





On Mon, May 13, 2013 at 1:39 PM, David Rudel <fwqhgads@gmail.com> wrote:
Michael,
Getting back to this because I ran into another example.

I have a script where I run out of memory [> 5 GB] when using this command:

<xsl:variable name="stint.info">
    <stint>
        <xsl:copy-of select="saxon:stream(doc(session.extractor/@file))/student/stint/@*"/>
    </stint>
</xsl:variable>

But I do not run out of memory [java heap stays below 500 MB] when using

<xsl:variable name="stint.info">
    <stint>
        <xsl:copy-of select="doc(session.extractor/@file)/student/stint/@*" saxon:read-once="yes"/>
    </stint>
</xsl:variable>

I was under the impression that these two expressions should provoke the same functionality, or at least be applicable to the same expressions.

I am using Saxon 9.3.0.5. I thought saxon:stream was introduced by that release. Unfortunately, using the "change log" functionality in the interactive documentation, I could not find any mention of when saxon:stream was officially introduced (going back to 7.0). I can only see a reference to it being obsolete owing to xsl:stream in the current release.

I am unable to run from the command line because I use Oxygen and do not have an independent license for EE. While Oxygen does not allow me to put command line parameters, it does allow me to use a config file. Is there an equivalent command within a config file that accomplishes the same thing as "-t"? I notice there is a "timing" configuration option, so maybe [timing = "true"] would do it?

-David



On Fri, Apr 26, 2013 at 2:37 PM, David Rudel <fwqhgads@gmail.com> wrote:
I assumed that when a document was handled by the saxon:stream pseudo-function that the called document was available for garbage collection since the expectation is that the document is only read once.

However, I was fiddling with streaming in one of my scripts and I saw behavior that does comport to the above assumption.

In the stylesheet I use the following command:

<xsl:variable name="session.traces" select="saxon:stream(doc(session.extractor/@file)/student/session/systemTrace)"/>
 
(This command is used inside a loop that executes hundreds of times. In each call of the loop, a different file is referenced by "session.extractor/@file". These files are about 3M in size.)

This caused the application to run out of memory.

I added

<xsl:value-of select="saxon:discard-document(document(session.extractor/@file))/a"/>

and this fixed the memory issue.

Is this the expected behavior?

-David


--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.



--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.



--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may_______________________________________________


------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and
their applications. This 200-page book is written by three acclaimed
leaders in the field. The early access version is available now.
Download your free book today! http://p.sf.net/sfu/neotech_d2d_may
_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help



--

"A false conclusion, once arrived at and widely accepted is not dislodged easily, and the less it is understood, the more tenaciously it is held." - Cantor's Law of Preservation of Ignorance.