Thanks for reporting this. You're right about the problem.
 
Michael Kay


From: saxon-help-admin@lists.sourceforge.net [mailto:saxon-help-admin@lists.sourceforge.net] On Behalf Of Menzo Windhouwer
Sent: 12 August 2004 15:53
To: saxon-help@lists.sourceforge.net
Cc: Peter Rodgers
Subject: [saxon] xquery: unions fail when an external DOM document is used

Dear all,

Using Saxon from 1060 NetKernel I stumbled upon the following
problem. DOMs loaded by an external URI resolver don't get a
documentNumber causing union expressions (and maybe other)
in the XQuery to fail. Attached you find a small sample application
to illustrate this:

three source documents contain different lists with some overlap,
the following xquery shows this overlap:

<results>
{
        let $l1 := doc("one.xml")//item,
                $l2 := doc("two.xml")//item,
                $l3 := doc("three.xml")//item,
                $c1 := $l1/@nr,
                $c2 := $l2/@nr,
                $c3 := $l3/@nr
        for $c in distinct-values($c1 union $c2 union $c3)
        order by $c cast as xs:decimal
        return
                <result>
                        <nr> { $c } </nr>
                        { if ( exists(index-of($c1,$c)) ) then <one/> else () }
                        { if ( exists(index-of($c2,$c)) ) then <two/> else () }
                        { if ( exists(index-of($c3,$c)) ) then <three/> else () }
                </result>
}
</results>

This gives the following output using the net.sf.saxon.Query (and style.xsl)
(I assume you email client renders HTML)

nr one two three
1 X
X
2 X
X
3 X

4 X X
5 X X
6 X X
7
X
8
X X
9
X X
10

X
11

X

However, when the source documents are loaded as DOMs by the URI resolver,
the result changes:

nr one two three
1 X
X
2 X
X
8
X X
9
X X
10

X
11

X

This result changes with the order of the union arguments.

I delved a bit in the code:

expr.UnionEnumeration uses a sort.NodeOrderComparer to determine which node
from the two nodesets to return. When the comparer returns 0 the nodes are
equals and the node of the second nodeset is returned. This happens all the
time as the comparer always returns 0, hence the dependency of the result
on the last nodeset ... only this set is returned in the end. The actual
sort.NodeOrderComparer used is sort.GlobalOrderComparer. This comparer
check if the nodes are from the same document. If not, which is the case
in our example, it compares the document numbers. However, the
dom.DocumentWrapper always returns 0, as the function setNamePool
(which sets the documentNumber) is never called. Its unclear to me
where and when this method should be invoked ...

xq.tgz contains a small application to reproduce this problem:

java -cp ~/local/java/saxon/saxon8.jar:. xq 1 | xsltproc style.xsl - > 1.html
    doesn't use the URI resolver and produces the correct result

java -cp ~/local/java/saxon/saxon8.jar:. xq 2 | xsltproc style.xsl - > 2.html
    uses the URI resolver and shows the problem

java -cp ~/local/java/saxon/saxon8.jar:. xq 3 | xsltproc style.xsl - > 3.html
    replaces the doc(...) calls by external node() parameters, and still shows
    the problem

In NetKernel the problem is currently circumvented by providing the
raw document streams instead of DOMs.

Greetings,

Menzo
-- 
Menzo Windhouwer, Theoretische Taalwetenschap (UvA)
kamer 306, Spuistraat 210 (Bungehuis) 1012 VT A'dam
telefoon:020 525 3104, e-mail:M.A.Windhouwer@uva.nl