We are using XSLT2.0 with Saxon as processor. We are processing various
varieties of XML docs using multiple stylesheets. However there are few common
stylesheet which process common elements in these XML docs. To handle the
variety, we have many template matches and hence these common stylesheets are
now becoming substantial in size. My question is, does it really matter in
stylesheet to place frequently used templates at the bottom of stylesheet and
rarely used templates at the top of stylesheets? Does it impact in improving
performance in any way?
I also would like to know if I had used XSLT1.0 and Xalan, would it have
mattered?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Interesting question. With most stylesheets, the majority of template rules
will only match elements having a particular name. The only time that template
matching is going to be expensive is when you have many templates that match
the same element name (or none), for example lots of templates saying
match="xx
" with different predicates, and where these templates all have the same
precedence and priority. In this situation Saxon will indeed work sequentially
through all the patterns in this set of template rules, starting with the one
that comes last in stylesheet declaration order. If the predicates are in fact
mutually exclusive (so there's no ambiguity and the same template will be
selected regardless of ordering), then it probably does make sense to put the
one that's most likely to match last - unless it is much more expensive to
evaluate than other match patterns, in which case the cost of evaluation will
also be a factor.
But if you're really convinced that performance depends on this factor, then
rather than rearranging the deck-chairs, a better approach might be to rethink
the algorithm. Perhaps there is some way of using keys to categorize the
elements more efficiently?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Incidentally one case where I've seen template matching degrade to a linear
search is where all the template rules take the form match="*:local-name".
Unlike template rules that match a specific element name, these don't benefit
from any hash lookup.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In this situation Saxon will indeed work sequentially through all the
patterns in this set of template rules, starting with the one that comes last
in stylesheet declaration order. If the predicates are in fact mutually
exclusive (so there's no ambiguity and the same template will be selected
regardless of ordering), then it probably does make sense to put the one
that's most likely to match last - unless it is much more expensive to
evaluate than other match patterns, in which case the cost of evaluation will
also be a factor.
So in my case, I do have some scenarios where I do predicate check and these
predicates are mutually exclusive. I am trying to understand this in detail if
declaration order is significant.
In this article under section "Pattern matching for template rules", you
mentioned:
Saxon's strategy is therefore to separate rules into two kinds: specific
rules, where the node type and name are explicitly specified in the pattern,
and general rules, where they aren't. The data structure for the decision tree
contains a hash table for the specific rules, keyed on the node type and node
name, in which each entry is a list of rules sorted by decreasing priority;
plus a single list for all the general rules, again in priority order. To find
the pattern for a particular node, the processor makes two searches: one for
the highest-priority specific rule for the relevant node type and name that
matches the node, and one for the highest-priority general rule that matches
the node. Whichever of these has highest priority is then chosen.
My question is if all the templates are stored in hash table with relevant
priorities, so for any match processor is going to do hash look. So even if
declaration order plays a role, it might not be significant because it's just
a hash look up. If the processor goes through XSL tree again (bottom up) then
may be it's improvement because as soon as a match is found perhaps processor
will stop looking further and entire tree traversal is not required.
Please let me know your thoughts?
Also would XSLT1.0 behave differently if used with Xalan for a similar
scenario? I am just trying to find out differences in implementations.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I can't tell you how Xalan is implemented - I simply don't know.
The key question for Saxon is, do you have lots of template rules that are the
same at the level of the (node kind and node name) that they match. For
example, lots of rules that say match="*", or lots of rules that say
match="XYZ for the same value of XYZ.
The other question is, have you done any measurements to lead you to believe
that you have a problem in this area? Because if you haven't, then we are both
wasting our time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We are using XSLT2.0 with Saxon as processor. We are processing various
varieties of XML docs using multiple stylesheets. However there are few common
stylesheet which process common elements in these XML docs. To handle the
variety, we have many template matches and hence these common stylesheets are
now becoming substantial in size. My question is, does it really matter in
stylesheet to place frequently used templates at the bottom of stylesheet and
rarely used templates at the top of stylesheets? Does it impact in improving
performance in any way?
I also would like to know if I had used XSLT1.0 and Xalan, would it have
mattered?
Interesting question. With most stylesheets, the majority of template rules
will only match elements having a particular name. The only time that template
matching is going to be expensive is when you have many templates that match
the same element name (or none), for example lots of templates saying
match="xx
" with different predicates, and where these templates all have the same
precedence and priority. In this situation Saxon will indeed work sequentially
through all the patterns in this set of template rules, starting with the one
that comes last in stylesheet declaration order. If the predicates are in fact
mutually exclusive (so there's no ambiguity and the same template will be
selected regardless of ordering), then it probably does make sense to put the
one that's most likely to match last - unless it is much more expensive to
evaluate than other match patterns, in which case the cost of evaluation will
also be a factor.
But if you're really convinced that performance depends on this factor, then
rather than rearranging the deck-chairs, a better approach might be to rethink
the algorithm. Perhaps there is some way of using keys to categorize the
elements more efficiently?
Incidentally one case where I've seen template matching degrade to a linear
search is where all the template rules take the form match="*:local-name".
Unlike template rules that match a specific element name, these don't benefit
from any hash lookup.
Thanks for a quick response Mike. You said:
So in my case, I do have some scenarios where I do predicate check and these
predicates are mutually exclusive. I am trying to understand this in detail if
declaration order is significant.
I was reading an article written by you about Saxon in IBM developerWorks - h
ttp://www.ibm.com/developerworks/library/x-xslt2/
In this article under section "Pattern matching for template rules", you
mentioned:
My question is if all the templates are stored in hash table with relevant
priorities, so for any match processor is going to do hash look. So even if
declaration order plays a role, it might not be significant because it's just
a hash look up. If the processor goes through XSL tree again (bottom up) then
may be it's improvement because as soon as a match is found perhaps processor
will stop looking further and entire tree traversal is not required.
Please let me know your thoughts?
Also would XSLT1.0 behave differently if used with Xalan for a similar
scenario? I am just trying to find out differences in implementations.
I can't tell you how Xalan is implemented - I simply don't know.
The key question for Saxon is, do you have lots of template rules that are the
same at the level of the (node kind and node name) that they match. For
example, lots of rules that say match="*", or lots of rules that say
match="XYZ for the same value of XYZ.
The other question is, have you done any measurements to lead you to believe
that you have a problem in this area? Because if you haven't, then we are both
wasting our time.