Saxon 9.2 generate-id() bug

Help
2011-07-25
2012-10-08
  • Hello, Mr Kay!

    It seems I've found an optimizer bug in v9.2.
    The code is like this:

    <xsl:stylesheet version="2.0" 
      xmlns:xsl="[url]http://www.w3.org/1999/XSL/Transform[/url]"
      xmlns:xs="[url]http://www.w3.org/2001/XMLSchema[/url]"
      xmlns:t="[url]http://www.nesterovsky-bros.com/xslt/public[/url]"
      exclude-result-prefixes="t xs">
    
      <xsl:template match="/" name="main">
        <xsl:variable name="content">
          <root>
            <xsl:for-each select="1 to 3">
              <item/>
            </xsl:for-each>
          </root>
        </xsl:variable>
    
        <xsl:variable name="result">
          <root>
            <xsl:for-each select="$content/root/item">
              <section-ref
                name-ref="{t:generate-id()}.s">
                <!--<xsl:attribute name="name-ref">
                  <xsl:value-of select="t:generate-id()"/>
                  <xsl:text>.s</xsl:text>
                </xsl:attribute>-->
              </section-ref>
            </xsl:for-each>
          </root>
        </xsl:variable>
    
        <xsl:message select="$result"/>
      </xsl:template>
    
      <xsl:function name="t:generate-id" as="xs:string">
        <xsl:variable name="element" as="element()">
          <element/>
        </xsl:variable>
    
        <xsl:sequence select="generate-id($element)"/>
      </xsl:function>
    
    </xsl:stylesheet>
    

    It should produce items with unique name-ref attributes, while for me, it
    generates same values.
    --
    Thanks.
    Vladimir Nesterovsky
    http://www.nesterovsky-bros.com

     
  • Any update on this?
    Bug appears in v9.3 also.

     
  • Michael Kay
    Michael Kay
    2011-07-31

    Sorry, I sent my response as a reply to the notification message, so it
    disappeared into thin air. Here it is again:

    If you had created name="element" as an xsl:param rather than an xsl:variable,
    then this note would apply, found in section 9.2:

    Note: This specification does not dictate whether and when the default value
    of a parameter is evaluated. For example, if the default is specified as
    <xsl:param name="p"><foo/></xsl:param>, then it is not specified whether a
    distinct foo element node will be created on each invocation of the template,
    or whether the same foo element node will be used for each invocation.
    However, it is permissible for the default value to depend on the values of
    other parameters, or on the evaluation context, in which case the default must
    effectively be evaluated on each invocation.

    There is no corresponding note for xsl:variable; however, notes are non-
    normative, and the purpose of this note is to point out that because the
    specification does not guarantee a new node is created each time, therefore
    there is no guarantee; and this applies equally to xsl:variable, which also
    offers no such guarantee.

    This is a tricky area. where the specification is not as precise as it perhaps
    should be. I think it's equally true that for XSLT, there's nothing in the
    spec that guarantees that each invocation of

    <xsl:function name="new-node">
    <node/>
    </xsl:function>

    will return a new node, though I think that as far as the equivalent XQuery
    function is concerned, the formal semantics does offer a guarantee.

    But in the absence of a clear guarantee in the spec, I'm not prepared to treat
    this one as a bug. By all means raise a bug against the spec if you want. (We
    may have difficulty dealing with it because in the absence of a formal
    semantics for XSLT, it's very hard to define such rules rigorously.)

    Michael Kay
    Saxonica

     
  • At least there is uncertainty in the implementation:

    1. <section-ref name-ref="{t:generate-id()}.s"/>

    2. <section-ref name-ref="{t:generate-id()}.s">
      <xsl:attribute name="name-ref">
      <xsl:value-of select="t:generate-id()"/>
      <xsl:text>.s</xsl:text>
      </xsl:attribute>
      </section-ref>

      3.
      [code]
      <xsl:variable name="id" as="xs:string" select="t:generate-id()"/>
      <section-ref name-ref="{$id}.s"/>
      [/code]

     

    Related

    Code: code

  • Also, a variation of t:generate-id('a') does not work either:

      <xsl:function name="t:generate-id" as="xs:string">
        <xsl:param name="name" as="xs:string"/>
    
        <xsl:variable name="element" as="element()">
          <xsl:element name={$name}/>
        </xsl:variable>
    
        <xsl:sequence select="generate-id($element)"/>
      </xsl:function>
    

    Well, one can argue that a function may return the same node for each
    invocation with the same arguments,
    but it sounds too isoteric (yes, I know about memo functions).

     
  • I hope the following will convince you that we're talking of Saxon bug rather
    than implementation defined behaviour.
    Here, an element constructed follows the note 9.2 of spec: "...it is
    permissible for the default value to depend on the values of other parameters,
    or on the evaluation context, in which case the default must effectively be
    evaluated on each invocation."

    <?xml version="1.0" encoding="utf-8"?>
    <xsl:stylesheet version="2.0" 
      xmlns:xsl="[url]http://www.w3.org/1999/XSL/Transform[/url]"
      xmlns:xs="[url]http://www.w3.org/2001/XMLSchema[/url]"
      xmlns:t="[url]http://www.nesterovsky-bros.com/xslt/public[/url]"
      exclude-result-prefixes="t xs">
    
      <xsl:template match="/" name="main">
        <xsl:variable name="result">
          <root>
            <xsl:for-each select="1 to 3">
              <section-ref name-ref="{t:generate-id()}.s"/>
              <!--
              <xsl:variable name="id" as="xs:string" select="t:generate-id()"/>
              <section-ref name-ref="{$id}.s"/>
              -->
            </xsl:for-each>
          </root>
        </xsl:variable>
    
        <xsl:message select="$result"/>
      </xsl:template>
    
      <xsl:function name="t:generate-id" as="xs:string">
        <xsl:call-template name="t:generate-id"/>    
      </xsl:function>
    
      <xsl:template name="t:generate-id" as="xs:string">
        <xsl:param name="name" as="xs:string" select="'a'"/>
        <xsl:param name="element" as="element()">
          <xsl:element name="{$name}"/>
        </xsl:param>
    
        <xsl:sequence select="generate-id($element)"/>
      </xsl:template>
    
    </xsl:stylesheet>
    

    This code returns same name-ref values, while commented one works as expected.

    Thanks.
    Vladimir Nesterovsky
    http://www.nesterovsky-bros.com

     
  • Michael Kay
    Michael Kay
    2011-08-01

    Saxon will move an expression such as t:gid() out of a loop if (a) the
    expression has no dependencies on things that change within the loop, and (b)
    the expression is analyzed as being "non-creative" - that is, it doesn't
    create new nodes and return a value that depends on the node identity. Your
    call here would ideally be classified as "creative" to prevent this happening.
    However, I don't believe it's possible to analyze all possible cases and I
    believe that the spec gives considerable license to implementations in this
    regard. Showing me code that doesn't do what you want isn't by itself evidence
    that it violates the spec.

    I wonder if you could tell us what you are trying to achieve? Is it simply a
    function that returns something different each time it is called? If you want
    that, I think there are better ways of doing it.

     
    1. As you have suggested I've posted logged spec's bugzilla:
      http://www.w3.org/Bugs/Public/show_bug.cgi?id=13494

    I want it to be resolved on that level.

    1. I agree with your analysis, I also considered that function "creative" and assumed that it cannot be moved out of the loop. That's why I concluded it's bug of the optimizer.

    2. Even accepting that the spec is not clear, I think a developer would expect that the following two cases should perform the same:

    a)
    <section-ref name-ref="{t:generate-id()}.s"/>

    b)
    <xsl:variable name="id" as="xs:string" select="t:generate-id()"/>
    <section-ref name-ref="{$id}.s"/>

    1. Why do you think that t:generate-id() is not "creative".

    According to my reading:

    11.1 Literal Result Elements
    "... A literal result element is evaluated to construct a new element node..."

    I wonder if you could tell us what you are trying to achieve?
    Is it simply a function that returns something different each
    time it is called? If you want that, I think there are better ways of doing
    it.

    yes, I need to generate unique id. The code I have shown is a part of function
    to
    generate subtree which will be integrated into the bigger tree.
    Modularity restrictions are the main reason of working this way.

     
    1. I think that this behaviour is somehow related to AVT, as

    this poduces same values:
    <section-ref name-ref="{t:generate-id()}.s"/>

    while this - differnt:
    <section-ref name-ref="{t:generate-id()}"/>

     
  • I'm sorry to be so boring, but...

    Can you please point to the rules defining "creative", and "not creative"
    expressions (functions?),
    as I don't understand why the t:generate-id() in:

    <section-ref name-ref="{t:generate-id()}.s"/>

    is promoted out of the loop, while in the second case it's not:

    <xsl:variable name="id" as="xs:string" select="t:generate-id()"/>
    <section-ref name-ref="{$id}.s"/>

    --
    Thanks.
    Vladimir Nesterovsky
    http://www.nesterovsky-bros.com

     
  • Michael Kay
    Michael Kay
    2011-08-08

    I think the reason is that system functions that return an atomic value are
    assumed to be non-creative, with the exception of generate-id(). The AVT name-
    ref="{t:generate-id()}.s" involves an implicit call on concat(), and is
    therefore classified as non-creative.

    Clearly the analysis could be smarter, but one has to avoid the risk of non-
    termination when recursive functions are analyzed, and there's a law of
    diminishing returns that comes into play here. If the spec were stricter about
    defining exactly when you can rely on identity-dependent operations then I
    would be obliged to follow it, but I think it's probably deliberate that it
    currently allows implementations some latitude, effectively signalling to
    users that they should avoid depending on this aspect of the behaviour.

     
  • I don't agree with you words about the value of the code like t:generate-id(),
    and I think that WG should clarify the issue.
    I think that generator functions (unique ids, ordered and random sequences,
    and so on) are very important, and fit very well in functional programming.
    SQL gives here a good example.

    However, this discussion has helped me to define such generator function that
    will work even with current spec:

      <xsl:template match="/" name="main">
        <xsl:variable name="result">
          <root>
            <xsl:for-each select="1 to 3">
              <section-ref name-ref="{t:generate-id(.)}.s"/>
            </xsl:for-each>
          </root>
        </xsl:variable>
    
        <xsl:message select="$result"/>
      </xsl:template>
    
      <xsl:function name="t:generate-id" as="xs:string">
        <xsl:param name="context" as="item()*"/>
    
        <xsl:variable name="node">
          <xsl:sequence select="boolean($context)"/> 
        </xsl:variable>
    
        <xsl:sequence select="generate-id($node)"/>
      </xsl:function>
    

    Also, I think that a term "non-creative" is not informative. I would use terms
    "deterministic" or "stable".

    Thanks.

    Vladimir Nesterovsky
    http://www.nesterovsky-bros.com