Thanks for your efforts in getting me the large file.
The output is consistent with there not being an attribute
id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur" in the file. So I
grepped the file, and found
<gml:MultiSurface gml:id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">
But that should be OK, because you are looking for *:id.
I experimented changing the small file to match. In the course of the
experiments (using 9.4) I hit a bug: an NPE in
FleetingNode.getLocalName() which occurs when trying to get the name of
a document node. I've fixed that problem (which isn't in 9.3) and will
log it and issue a patch later.
In the hope of getting it to terminate before reading the whole big file
I added the predicate [1] after the test of *:@id. Oddly, this causes
the error
SXST0060: Expression net.sf.saxon.expr.SimpleStepExpression has more
than one
subexpression that reads descendants
Investigating that, it's technically true: we have an expression of the
form first(descendant::a)/descendant::b which the analyzer doesn't
recognize as being streamable. So, move the predicate [1] to the end of
the expression: saxon:stream((descendant::a/descendant::b)[1]).
Now, with the small file changed as above, I get:
<property xmlns:saxon="http://saxon.sf.net/"><test assert="id"/><test
assert="head"/></property>
which is consistent with the attribute not being matched.
On further investigation, the attribute IS being matched; but there are
no Polygon descendants. That's because Polygon is in the gml namespace!
Change it to *:Polygon, and we're back on track.
Back to the big file: success!
<?xml version="1.0" encoding="UTF-8"?><property
xmlns:saxon="http://saxon.sf.net/"><test assert="id"/><test
assert="head"><nohead>
A polygon must have a 'head' for the title.
<diagnostic>
No title (head element) for polygon
UUID_a5557a09-a5e6-434f-b329-4356dbccd529_Ur</diagnostic></nohead></test></property>
Execution time: 14m 56.666s (896666ms)
Memory used: 8032312
NamePool contents: 15 entries in 15 chains. 7 URIs
As an experiment, I changed the code to use an explicit namespace
(@gml:id, @gml:Polygon etc). This brought the time down to 13m 33s.
So, a simple programming error. The trouble is, such errors can waste an
awful lot of time when handling large data files. I think the lessons
are (a) create manageable test data to use while developing your code,
(b) think about making your code schema-aware, which will often spot
simple errors in path expressions like this one.
The small change that made the code non-streamable is also interesting.
We're discovering that with the rules in the draft W3C spec, it's very
hard to predict which constructs will be streamable and which won't, and
the Saxon rules are equally arbitrary. We're currently examining whether
it would be better to have rules that are more restrictive but easier to
explain.
Michael Kay
Saxonica
On 24/01/2012 10:13, Denis DEBARBIEUX wrote:
> Hi all,
>
> I am trying to evaluate a XSLT sheet with SAXON in streaming mode. I
> have trouble.
>
> My message is organise in 3 parts:
> A. Create the XSLT sheet
> B. Evaluate it on a small document: here, I have no problem.
> C. Evaluate it on a huge document: here, I have trouble.
>
> A. Create the XSLT sheet
> The goal of my sheet is to check two very simple properties and to
> output messages when they fail.
>
> The properties are evaluate over node labelled by Polygon . Then the
> two assertions are:
> 1. this polygon has an attribute id.
> 2. this polygon has a child head.
>
> <xsl:transform
> xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='3.0'
> xmlns:saxon='http://saxon.sf.net/'>
> <xsl:mode streamable="yes" />
> <xsl:template name="main" match="/">
> <property>
> <!-- get all the polygons of a MultiSurface specified by its id -->
> <xsl:variable name="context"
> select="saxon:stream(descendant::*:MultiSurface[attribute::*:id='UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur']//Polygon)"/>
>
> <!-- this polygon has an attribute id. --->
> <test assert="id">
> <xsl:if test="$context[not(attribute::*:id)]" >
> <noid>
> A polygon must have an id value
> </noid>
> </xsl:if>
> </test>
> <!--this polygon has a child head.-->
> <test assert="head">
> <xsl:if test="$context[not(child::*:head)]" >
> <nohead>
> A polygon must have a 'head' for the title.
> <diagnostic>
> <xsl:variable name="polygonId" select="$context/attribute::*:id"/>
> No title (head element) for polygon
> <xsl:value-of select="$polygonId"/>
> </diagnostic>
> </nohead>
> </xsl:if>
> </test>
> </property>
> </xsl:template>
> </xsl:transform>
>
> B. Evaluate it on a small document
> Here is the input document
> <doc>
> <MultiSurface id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">
> <surfaceMember>
> <Polygon id="Polygon1">
> </Polygon>
> </surfaceMember>
> </MultiSurface>
> </doc>
>
> As the polygon has an attribute id but it has not child head. The
> output is
> <?xml version="1.0" encoding="UTF-8"?>
> <property xmlns:saxon="http://saxon.sf.net/">
> <test assert="id"/>
> <test assert="head">
> <nohead>
> A polygon must have a 'head' for the title.
> <diagnostic>
> No title (head element) for polygon Polygon1
> </diagnostic>
> </nohead>
> </test>
> </property>
>
>
> C. Evaluate it on a huge document (44Gbytes). It can be downloaded
> here:
> ftp://ftp.fpk.tu-berlin.de/pub/Kolbe/22x22xTestEttenheim_inkl_4xLod4Buildings.zip
>
> Here is a part of the huge document:
> <CityModel >
> ...
> <MultiSurface id="UUID_643df4fe-4a26-4d86-b245-1b5fd0049191_Ur">
> <surfaceMember>
> <Polygon id="UUID_a5557a09-a5e6-434f-b329-4356dbccd529_Ur">
> ...
>
> As in the previous example, the polygon has an attribute id but has
> not child head. Indeed, the label 'head' does not occur in the document
> ddebarbieux@... ~
> $ grep head
> /cygdrive/c/huge_document/22x22xTestEttenheim_inkl_4xLod4Buildings.xml
>
> ddebarbieux@... ~
>
> The output is the following. No message is displayed about the missing
> head.
> <?xml version="1.0" encoding="UTF-8"?>
> <property xmlns:saxon="http://saxon.sf.net/">
> <test assert="id"/>
> <test assert="head"/>
> </property>
>
> Denis
> --
> Denis Debarbieux
> Engineer at INRIA
>
>
> ------------------------------------------------------------------------------
> Keep Your Developer Skills Current with LearnDevNow!
> The most comprehensive online learning library for Microsoft developers
> is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
> Metro Style Apps, more. Free future releases when you subscribe now!
> http://p.sf.net/sfu/learndevnow-d2d
>
>
> _______________________________________________
> saxon-help mailing list archived at http://saxon.markmail.org/
> saxon-help@...
> https://lists.sourceforge.net/lists/listinfo/saxon-help
|