## saxon-help

 RE: [saxon] Performance Question regarding Ancestor- and Descenant-Axis From: Andrew Welch - 2003-05-14 11:32:19 ```> Can anyone explain this behaviour? Is there a better solution (e.g. > recursion)? Well if you think about it, going from root to decendent x will happen = once (as there is only one root), and going from decendent x to root for = each element will happen once for each decendent (very expensive). I can't think off hand of a better way than 1. other than writing a sax = filter that adds an attribute to the wanted element, but it depends on = how complex your criteria() is I suppose. cheers andrew > -----Original Message----- > From: Stefan.Wachter@... [mailto:Stefan.Wachter@...] > Sent: 14 May 2003 11:39 > To: saxon-help@... > Subject: [saxon] Performance Question regarding Ancestor- and > Descenant-Axis >=20 >=20 > Hi all! >=20 > I am using Saxon 6.5.2 to process medium sized XML elements > (approx. 50.000 lines if elements are indented). During processing > the following situation occurs: >=20 > I have a root node \$r and some descendant \$d. I have to select the > first (or last) node in between them (\$d included) that satisfies a > certain criteria. >=20 > I see two options to select the node: >=20 > 1. \$r/descendant::*[criteria() and \$d =3D descendant-or-self::*][1] >=20 > or >=20 > 2. \$d/ancestor-or-self::*[criteria() and \$r =3D ancestor::*][last()] >=20 > Both selects are rather slow. Yet, what really surprises me=20 > is that the > second option is much slower than the first one. I would expect the > option using the ancestor axis to be much faster because the "search > space" is restricted much more by using the ancestor axis=20 > than by using the > descendant axis. >=20 > I investigated the situation both using the tiny tree model and the > standard tree model. >=20 > Can anyone explain this behaviour? Is there a better solution (e.g. > recursion)? >=20 > Thanks for your attention and best regards, > --Stefan >=20 >=20 >=20 >=20 > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux=20 > enterprise solutions > http://www.enterpriselinuxforum.com >=20 > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help >=20 >=20 >=20 >=20 > --- > Incoming mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 > =20 >=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 =20 ```
 RE: [saxon] Performance Question regarding Ancestor- and Descenant-Axis From: Andrew Welch - 2003-05-14 11:33:45 ```Here again the differences are pretty obvious. If you are going to = create a variable using the ancestor axis for each element, = and there a lot of elements, then performance is going to take = a hit. Your second stylesheet is much more efficient as it doesn't = build node-set variables and it doesnt walk the ancestor axis... Also, when you say: > saxon's > performance is much, much worse than several other common=20 > processors i tried please provide some results, and maybe to back it up the xml source and = stylesheet used - then we can all learn from your tests. cheers andrew > -----Original Message----- > From: WATKIN-JONES,ADAM (HP-UnitedKingdom,ex1) > [mailto:adam_watkin-jones@...] > Sent: 14 May 2003 12:01 > To: 'Stefan.Wachter@...'; saxon-help@... > Subject: RE: [saxon] Performance Question regarding Ancestor- and > Descenant-Axis >=20 >=20 > ahh! it's not just me. i've been meaning to package up an=20 > email to demo a > performance problem i noticed with saxon and the ancestor=20 > axis. saxon's > performance is much, much worse than several other common=20 > processors i tried > with the same stylesheet. it's remained on my list of things=20 > to investigate > for so long now that i might as well send in some details. >=20 > rgds, adam >=20 > essentially, i had to replace a loop over deep elements that=20 > used ancestor > to pick up data from higher elements with an approach that=20 > picked up the > data from the higher elements to use later as i descended the=20 > tree to the > lower elements.=20 >=20 >=20 > the stylesheet i had problems with looked like this: >=20 > > xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform">; >=20 > >=20 > > > > > >=20 > > > > > - > > > > > > > > >=20 > >=20 >=20 > by rewriting it like this: >=20 > > xmlns:xsl=3D"http://www.w3.org/1999/XSL/Transform">; >=20 > >=20 > > > > > >=20 > > > > > >=20 > > > > Code)"/> > > >=20 > > > > > > >=20 > > > > > > > > >=20 > >=20 > i got sane performance back again. >=20 >=20 >=20 > -----Original Message----- >=20 > Hi all! >=20 > I am using Saxon 6.5.2 to process medium sized XML elements > (approx. 50.000 lines if elements are indented). During processing > the following situation occurs: >=20 > I have a root node \$r and some descendant \$d. I have to select the > first (or last) node in between them (\$d included) that satisfies a > certain criteria. >=20 > I see two options to select the node: >=20 > 1. \$r/descendant::*[criteria() and \$d =3D descendant-or-self::*][1] >=20 > or >=20 > 2. \$d/ancestor-or-self::*[criteria() and \$r =3D ancestor::*][last()] >=20 > Both selects are rather slow. Yet, what really surprises me=20 > is that the > second option is much slower than the first one. I would expect the > option using the ancestor axis to be much faster because the "search > space" is restricted much more by using the ancestor axis=20 > than by using the > descendant axis. >=20 > I investigated the situation both using the tiny tree model and the > standard tree model. >=20 > Can anyone explain this behaviour? Is there a better solution (e.g. > recursion)? >=20 > Thanks for your attention and best regards, > --Stefan >=20 >=20 >=20 >=20 > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux=20 > enterprise solutions > http://www.enterpriselinuxforum.com >=20 > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help >=20 >=20 > ------------------------------------------------------- > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > The only event dedicated to issues related to Linux=20 > enterprise solutions > http://www.enterpriselinuxforum.com >=20 > _______________________________________________ > saxon-help mailing list > saxon-help@... > https://lists.sourceforge.net/lists/listinfo/saxon-help >=20 >=20 >=20 >=20 > --- > Incoming mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 > =20 >=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 =20 ```
 RE: [saxon] Performance Question regarding Ancestor- and Descenant-Axis From: Michael Kay - 2003-05-21 11:21:54 ```You are right that Saxon 6.5.2 tinytree performance is pathological on this stylesheet. The good news is that the problem was fixed a while ago in the 7.x branch, and that it's much better on the "standard" tree. The figures I get are: 6.5.2 -dt 190.3 secs 6.5.2 -ds 14.2 secs 7.5.1 -dt 29.7 secs 7.5.1 -ds 28.2 secs It could still be improved further; I don't know why the -ds figure for 6.5.2 is so much better than either of the 7.5.1 figures. For most stylesheets I am finding that the performance of 6.5.2 and 7.5.1 is fairly similar. I do know why the 6.5.2 tinytree performance is bad: to keep the tree small, it doesn't contain pointers from a node to its parent. Most access paths to a node remember what the parent node was, so this doesn't matter; but when the node is reached using the descendant axis, this information isn't available and a search is necessary. In 7.x I fixed the problem, without a space penalty, by setting the "next" pointer in the last sibling to the parent, instead of to -1. Michael Kay > -----Original Message----- > From: WATKIN-JONES,ADAM (HP-UnitedKingdom,ex1) > [mailto:adam_watkin-jones@...] > Sent: 14 May 2003 17:21 > To: 'Michael Kay'; 'Andrew Welch'; saxon-help@... > Subject: RE: [saxon] Performance Question regarding Ancestor- > and Descenant-Axis > > > attached is a zip file containing an xml source file and an > xsl transform. > > latest figures from my machine: > > .net 1.0 sp2 ~ 18s > jd.xslt 1.5.2 ~ 11s (had to make it use xerces parser - > java1.4 parser > throws up) > ms3 8.30.9926.0 ~ 27s > ms4 4.20.9818.0 ~ 4.5s > saxon 6.5.2 ~ 212s > > interestigly, xalan 2.4.1 : > (Location of error unknown)XSLT Error > (javax.xml.transform.TransformerException) > : org.apache.xml.dtm.DTMException: No more DTM IDs are available > > > hope it helps! > rgds, > adam > > > > -----Original Message----- > From: Michael Kay > > > Hi Andrew, > > > > Thanks for your reply. Indeed, the second stylesheet is > > considerably more efficient! Some rough (e.g., didn't take > > vm startup into account, used os analogue clock!) > > measurements I made at the time: > > > > kahuna1.xslt: > > > > xalan - 45s > > saxon - 4m 30s > > jd.xslt - 15s > > msxml 4 - 10s > > msxml 3 - 23s > > .net - 18s > > > > kahuna2.xslt: > > > > xalan - 16s > > saxon - 8s > > jd.xslt - 10s > > msxml 4 - 6s > > msxml 3 - 10s > > .net - 9s > > > > I'm always interested to receive stylesheets where Saxon's > performance seems to be unusually bad, and the first example > above is certainly in that category. The second set of > measurements are much more typical. Sometimes it turns out > that the bad performance is caused by very simple performance > bugs, and I can't find these until you tell me about them! > > Michael Kay > > ```
 RE: [saxon] Performance Question regarding Ancestor- and Descenant-Axis From: - 2003-05-14 11:36:19 ```Hi Andrew, using the descendant-axis spans a possibly huge "search tree": if there 5 levels between the root node \$r and the descendant \$d and there a 10 children at each level then the possible candidate nodes selected \$r/descedant are 10 + 100 + 1.000 + 10.000 + 100.000 nodes (assuming that \$d is at the leaf-level). On the other side using the ancstor axis there are only 5 candidate nodes (assuming the \$r is the document element). Therefore the ancestor approach should be much more efficient than the descendant approach. I guess that Saxon optimizes expressions. Yet, it seems that Saxon optimizes expressions using the ancestor axis not very well. Bye, --Stefan > > > Can anyone explain this behaviour? Is there a better solution (e.g. > > recursion)? > > Well if you think about it, going from root to decendent x will happen > once (as there is only one root), and going from decendent x to root for > each element will happen once for each decendent (very expensive). > > I can't think off hand of a better way than 1. other than writing a sax > filter that adds an attribute to the wanted element, but it depends on > how complex your criteria() is I suppose. > > cheers > andrew > > > > -----Original Message----- > > From: Stefan.Wachter@... [mailto:Stefan.Wachter@...] > > Sent: 14 May 2003 11:39 > > To: saxon-help@... > > Subject: [saxon] Performance Question regarding Ancestor- and > > Descenant-Axis > > > > > > Hi all! > > > > I am using Saxon 6.5.2 to process medium sized XML elements > > (approx. 50.000 lines if elements are indented). During processing > > the following situation occurs: > > > > I have a root node \$r and some descendant \$d. I have to select the > > first (or last) node in between them (\$d included) that satisfies a > > certain criteria. > > > > I see two options to select the node: > > > > 1. \$r/descendant::*[criteria() and \$d = descendant-or-self::*][1] > > > > or > > > > 2. \$d/ancestor-or-self::*[criteria() and \$r = ancestor::*][last()] > > > > Both selects are rather slow. Yet, what really surprises me > > is that the > > second option is much slower than the first one. I would expect the > > option using the ancestor axis to be much faster because the "search > > space" is restricted much more by using the ancestor axis > > than by using the > > descendant axis. > > > > I investigated the situation both using the tiny tree model and the > > standard tree model. > > > > Can anyone explain this behaviour? Is there a better solution (e.g. > > recursion)? > > > > Thanks for your attention and best regards, > > --Stefan > > > > > > > > > > ------------------------------------------------------- > > Enterprise Linux Forum Conference & Expo, June 4-6, 2003, Santa Clara > > The only event dedicated to issues related to Linux > > enterprise solutions > > http://www.enterpriselinuxforum.com > > > > _______________________________________________ > > saxon-help mailing list > > saxon-help@... > > https://lists.sourceforge.net/lists/listinfo/saxon-help > > > > > > > > > > --- > > Incoming mail is certified Virus Free. > > Checked by AVG anti-virus system (http://www.grisoft.com). > > Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 > > > > > > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.478 / Virus Database: 275 - Release Date: 06/05/2003 > > ```