Menu

Template ordering in stylesheets

Help
Sahil
2012-05-11
2012-10-08
  • Sahil

    Sahil - 2012-05-11

    We are using XSLT2.0 with Saxon as processor. We are processing various
    varieties of XML docs using multiple stylesheets. However there are few common
    stylesheet which process common elements in these XML docs. To handle the
    variety, we have many template matches and hence these common stylesheets are
    now becoming substantial in size. My question is, does it really matter in
    stylesheet to place frequently used templates at the bottom of stylesheet and
    rarely used templates at the top of stylesheets? Does it impact in improving
    performance in any way?

    I also would like to know if I had used XSLT1.0 and Xalan, would it have
    mattered?

     
  • Michael Kay

    Michael Kay - 2012-05-11

    Interesting question. With most stylesheets, the majority of template rules
    will only match elements having a particular name. The only time that template
    matching is going to be expensive is when you have many templates that match
    the same element name (or none), for example lots of templates saying
    match="xx

    " with different predicates, and where these templates all have the same
    precedence and priority. In this situation Saxon will indeed work sequentially
    through all the patterns in this set of template rules, starting with the one
    that comes last in stylesheet declaration order. If the predicates are in fact
    mutually exclusive (so there's no ambiguity and the same template will be
    selected regardless of ordering), then it probably does make sense to put the
    one that's most likely to match last - unless it is much more expensive to
    evaluate than other match patterns, in which case the cost of evaluation will
    also be a factor.

    But if you're really convinced that performance depends on this factor, then
    rather than rearranging the deck-chairs, a better approach might be to rethink
    the algorithm. Perhaps there is some way of using keys to categorize the
    elements more efficiently?

     
  • Michael Kay

    Michael Kay - 2012-05-11

    Incidentally one case where I've seen template matching degrade to a linear
    search is where all the template rules take the form match="*:local-name".
    Unlike template rules that match a specific element name, these don't benefit
    from any hash lookup.

     
  • Sahil

    Sahil - 2012-05-11

    Thanks for a quick response Mike. You said:

    In this situation Saxon will indeed work sequentially through all the
    patterns in this set of template rules, starting with the one that comes last
    in stylesheet declaration order. If the predicates are in fact mutually
    exclusive (so there's no ambiguity and the same template will be selected
    regardless of ordering), then it probably does make sense to put the one
    that's most likely to match last - unless it is much more expensive to
    evaluate than other match patterns, in which case the cost of evaluation will
    also be a factor.

    So in my case, I do have some scenarios where I do predicate check and these
    predicates are mutually exclusive. I am trying to understand this in detail if
    declaration order is significant.

    I was reading an article written by you about Saxon in IBM developerWorks - h
    ttp://www.ibm.com/developerworks/library/x-xslt2/

    In this article under section "Pattern matching for template rules", you
    mentioned:

    Saxon's strategy is therefore to separate rules into two kinds: specific
    rules, where the node type and name are explicitly specified in the pattern,
    and general rules, where they aren't. The data structure for the decision tree
    contains a hash table for the specific rules, keyed on the node type and node
    name, in which each entry is a list of rules sorted by decreasing priority;
    plus a single list for all the general rules, again in priority order. To find
    the pattern for a particular node, the processor makes two searches: one for
    the highest-priority specific rule for the relevant node type and name that
    matches the node, and one for the highest-priority general rule that matches
    the node. Whichever of these has highest priority is then chosen.

    My question is if all the templates are stored in hash table with relevant
    priorities, so for any match processor is going to do hash look. So even if
    declaration order plays a role, it might not be significant because it's just
    a hash look up. If the processor goes through XSL tree again (bottom up) then
    may be it's improvement because as soon as a match is found perhaps processor
    will stop looking further and entire tree traversal is not required.

    Please let me know your thoughts?

    Also would XSLT1.0 behave differently if used with Xalan for a similar
    scenario? I am just trying to find out differences in implementations.

     
  • Michael Kay

    Michael Kay - 2012-05-11

    I can't tell you how Xalan is implemented - I simply don't know.

    The key question for Saxon is, do you have lots of template rules that are the
    same at the level of the (node kind and node name) that they match. For
    example, lots of rules that say match="*", or lots of rules that say
    match="XYZ for the same value of XYZ.

    The other question is, have you done any measurements to lead you to believe
    that you have a problem in this area? Because if you haven't, then we are both
    wasting our time.