Thanks for your efforts in getting me the large file.

The output is consistent with there not being an attribute id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur" in the file. So I grepped the file, and found

<gml:MultiSurface gml:id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">

But that should be OK, because you are looking for *:id.

I experimented changing the small file to match. In the course of the experiments (using 9.4) I hit a bug: an NPE in FleetingNode.getLocalName() which occurs when trying to get the name of a document node. I've fixed that problem (which isn't in 9.3) and will log it and issue a patch later.

In the hope of getting it to terminate before reading the whole big file I added the predicate [1] after the test of *:@id. Oddly, this causes the error

  SXST0060: Expression net.sf.saxon.expr.SimpleStepExpression has more than one
  subexpression that reads descendants

Investigating that, it's technically true: we have an expression of the form first(descendant::a)/descendant::b which the analyzer doesn't recognize as being streamable. So, move the predicate [1] to the end of the expression: saxon:stream((descendant::a/descendant::b)[1]).

Now, with the small file changed as above, I get:

<property xmlns:saxon="http://saxon.sf.net/"><test assert="id"/><test assert="head"/></property>

which is consistent with the attribute not being matched.

On further investigation, the attribute IS being matched; but there are no Polygon descendants. That's because Polygon is in the gml namespace! Change it to *:Polygon, and we're back on track.

Back to the big file: success!

<?xml version="1.0" encoding="UTF-8"?><property xmlns:saxon="http://saxon.sf.net/"><test assert="id"/><test assert="head"><nohead>
                    A polygon must have a 'head' for the title.
                    <diagnostic>
                        No title (head element) for polygon UUID_a5557a09-a5e6-434f-b329-4356dbccd529_Ur</diagnostic></nohead></test></property>

Execution time: 14m 56.666s (896666ms)
Memory used: 8032312
NamePool contents: 15 entries in 15 chains. 7 URIs

As an experiment, I changed the code to use an explicit namespace (@gml:id, @gml:Polygon etc). This brought the time down to 13m 33s.

So, a simple programming error. The trouble is, such errors can waste an awful lot of time when handling large data files. I think the lessons are (a) create manageable test data to use while developing your code, (b) think about making your code schema-aware, which will often spot simple errors in path expressions like this one.

The small change that made the code non-streamable is also interesting. We're discovering that with the rules in the draft W3C spec, it's very hard to predict which constructs will be streamable and which won't, and the Saxon rules are equally arbitrary. We're currently examining whether it would be better to have rules that are more restrictive but easier to explain.

Michael Kay
Saxonica



On 24/01/2012 10:13, Denis DEBARBIEUX wrote:
Hi all,

I am trying to evaluate a XSLT sheet with SAXON in streaming mode. I have trouble.

My message is organise in 3 parts:
A. Create the XSLT sheet
B. Evaluate it on a small document: here, I have no problem.
C. Evaluate it on a huge document: here, I have trouble.

A. Create the XSLT sheet
  The goal of my sheet is to check two very simple properties and to output messages when they fail.

The properties are evaluate over node labelled by Polygon . Then the two assertions are:
1. this polygon has an attribute id.
2. this polygon  has a child head.

<xsl:transform
    xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0' 
    xmlns:saxon='http://saxon.sf.net/'>
    <xsl:mode streamable="yes" /> 
    <xsl:template name="main" match="/">
        <property>
            <!-- get all the polygons of a MultiSurface specified by its id -->
            <xsl:variable name="context" select="saxon:stream(descendant::*:MultiSurface[attribute::*:id='UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur']//Polygon)"/>           
            <!-- this polygon has an attribute id.  --->
            <test assert="id">               
                <xsl:if test="$context[not(attribute::*:id)]" >
                    <noid>
                    A polygon  must have an id value                                       
                    </noid>
                </xsl:if>
            </test>
            <!--this polygon  has a child head.-->
            <test assert="head">
                <xsl:if test="$context[not(child::*:head)]" >
                    <nohead>
                    A polygon must have a 'head' for the title.
                    <diagnostic>
                        <xsl:variable name="polygonId" select="$context/attribute::*:id"/>
                        No title (head element) for polygon <xsl:value-of select="$polygonId"/>
                    </diagnostic>                                                        
                    </nohead>

                </xsl:if>
            </test>   
        </property> 
    </xsl:template> 
</xsl:transform>

B. Evaluate it on a small document
Here is the input document
<doc>
    <MultiSurface id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">
         <surfaceMember>
             <Polygon id="Polygon1">
             </Polygon>
         </surfaceMember>
     </MultiSurface>
</doc>

As the polygon has an attribute id but it has not child head. The output is
<?xml version="1.0" encoding="UTF-8"?>
<property xmlns:saxon="http://saxon.sf.net/">
    <test assert="id"/>
      <test assert="head">
         
<nohead>
            A polygon must have a 'head' for the title.
            <diagnostic>
                No title (head element) for polygon Polygon1
            </diagnostic>
        </nohead>
    </test>

</property>


C. Evaluate it on a huge document (44Gbytes). It can be downloaded here: ftp://ftp.fpk.tu-berlin.de/pub/Kolbe/22x22xTestEttenheim_inkl_4xLod4Buildings.zip

Here is a part of the huge document:
<CityModel >
        ...
       <MultiSurface id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">
          <surfaceMember>
            <Polygon id="UUID_a5557a09-a5e6-434f-b329-4356dbccd529_Ur">
             ...

As in the previous example, the polygon has an attribute id but has not child head. Indeed, the label 'head' does not occur in the document
ddebarbieux@ddebarbieux-PC ~
$ grep head  /cygdrive/c/huge_document/22x22xTestEttenheim_inkl_4xLod4Buildings.xml

ddebarbieux@ddebarbieux-PC ~

The output is the following. No message is displayed about the missing head.
<?xml version="1.0" encoding="UTF-8"?>
<property xmlns:saxon="http://saxon.sf.net/">
    <test assert="id"/>
    <test assert="head"/>
</property>

Denis
-- 
Denis Debarbieux
Engineer at INRIA


------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d


_______________________________________________
saxon-help mailing list archived at http://saxon.markmail.org/
saxon-help@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/saxon-help