When Saxon is evaluating //a, it doesn't know that there is only one
a-element and that the a-element doesn't have any further a-elements in its
subtree - it has to keep looking!
I'm afraid I don't know why there aren't more calls on getStringValue() -
that will depend on the context in which the path expression appears.
Remember that an expression like //a/text() isn't naturally sorted, so in
general I would expect the text nodes to be retrieved first, and then sorted
into document order before being atomized. In your example the sort will be
a no-op, but that wouldn't be the case for example with
where the parent of rrr is in the result of //a ahead of the parent of qqq.
It might be that you have used the path expression in a context where order
doesn't matter, for example as an operand of "=", in which case Saxon will
be quite happy to retrieve the results in the "wrong" order.
I have to tell you that when I walk through the evaluation of such
expressions in the debugger, I often get surprises myself - sometimes it
seems the optimizer has a mind of its own.
> -----Original Message-----
> From: Johannes Lichtenberger
> Sent: 11 December 2007 19:44
> To: Michael Kay
> Cc: 'Mailing list for SAXON XSLT queries'
> Subject: Re: [saxon] NodeInfo implementation
> On Tuesday 11 December 2007 18:10:10 Michael Kay wrote:
> > > I'm writing a NodeInfo implementation for a native XML Storage
> > > system,
> > First point is that as I'm sure you know this requires a
> pretty deep
> > understanding of Saxon internals, and I can't really hold
> your hand...
> I know it needs a deep understanding of Saxon internals, but
> I've found your implementations for DOM, DOM4J and JDOM, so I
> think it might be doable... and maybe it's the last thing
> which doesn't work. All of my JUnit testcases are working
> now, except this one :-/
> And I have debugged the whole thing, but it seems it doesn't help...
> > but I have a problem with text nodes. When I
> > > have an XPath query like //element/text() it should
> return all text
> > > nodes in a sequence, but it only returns the last
> text-node and it
> > > seems it first takes the descendant-axis to find the
> element, then
> > > the child axis, but then it takes the descendant axis
> another time
> > > (first the child axis with the element it found as the
> root element,
> > > but then also the descandant-axis with the element it found).
> > If it's not doing the right thing then it's a bug in your code, and
> > it's difficult for me to help you find the bug. I
> sympathize with you
> > because tracing the execution of deeply nested iterators
> can be very confusing.
> > Basically the PathExpression implementation will create a structure
> > that uses the descendant axis to iterate over the element
> nodes named
> > "element", with a mapping function so that each time it
> hits such an
> > element node, it returns an iterator using the child axis to return
> > the text nodes for that element. The combination of two iterators
> > returns all the text nodes of all the elements.
> In the case of //a/text() with a structure like:
> <a attribute="foobar">
> I can see that it uses the descendant axis until it hits the
> a-element, then uses the child-axis and gets foo1, b-Element,
> foo2, b-Element, foo3 but then it uses the descendant axis
> another time, so it gets foo1, b-element, c-element, foo2,
> b-element, foo3 and this time when it hits foo3 the
> getStringValue()-method is called, but only this time. So I
> don't know why the descendant-axis is called another time
> _after_ it has found the right element and why it only
> invokes getStringValue() when it's at foo3 at the end :-/