Suggest of optimization

2008-08-25
2012-10-08
  • Vladimir Nesterovsky

    Hello Mr Kay,

    I'd like to suggest an optimization for a specific case.
    I have no statistics on how often this is used.
    My humble experience shows that I used it several times.
    The optimization aims to "compensate" a lack of maps in data model.

    A formal case

    There are two stages:
    1. Untrivial logic collecting nodes/values satisfying some criteria.
    2. Process data, and take a special action whenever a node/value is collected on the previous stage.

    A specific example

    I have defined jxom (xml schema describing java syntax).
    Among other api I have defined functions to remove unreachable code. This is done in two stages, as it cannot be easily implemented in one pass.

    1. Collect unreachable statements.
    2. If there are such then generate tree without these statements.

    Just in case java statement reachability is defined at
    http://java.sun.com/docs/books/jls/second_edition/html/statements.doc.html#236365

    To make such logic more efficient an optimization of

    exists($nodes intersect $node)

    and (or)

    exists(index-of($values, $value))

    could be done (through an index lookup).

    I want to believe my description is clear enough.
    If it's not then never mind...

    Thanks

     
    • Michael Kay

      Michael Kay - 2008-08-25

      Sorry, I wonder if you could give a more specific example of some XSLT or XQuery code that could be optimized better than is currently the case?

       
    • Vladimir Nesterovsky

      As the following code is oversimplified it looks stupid and it seems that it's inefficient. But please remember that original code is more complex.

      xml:
      <unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.test">
      <class access="public" name="MyClass">
      <class-method access="protected" name="run">
      <block>
      <switch>
      <test>
      <var name="x">
      <meta>
      <type name="int"/>
      </meta>
      </var>
      </test>
      <case>
      <value>
      <int>0</int>
      </value>
      <block>
      <expression>
      <assign>
      <var name="y">
      <meta>
      <type name="int"/>
      </meta>
      </var>
      <int>1</int>
      </assign>
      </expression>
      <return/>

              &lt;/block&gt;
            &lt;/case&gt;
            &lt;case&gt;
              &lt;block&gt;
                &lt;return/&gt;
      
                &lt;scope&gt;
                  &lt;comment&gt;
                    &lt;para&gt;NOTE: this is unreachable code.&lt;/para&gt;
                    &lt;scope&gt;
                      &lt;meta&gt;
                        &lt;bookmark&gt;
                          &lt;warning message=&quot;Unreachable code.&quot;/&gt;
                        &lt;/bookmark&gt;
                      &lt;/meta&gt;
                      &lt;expression remove-me=&quot;true&quot;&gt;
                        &lt;inc&gt;
                          &lt;var name=&quot;y&quot;&gt;
                            &lt;meta&gt;
                              &lt;type name=&quot;int&quot;/&gt;
                            &lt;/meta&gt;
                          &lt;/var&gt;
                        &lt;/inc&gt;
                      &lt;/expression&gt;
                    &lt;/scope&gt;
                  &lt;/comment&gt;
                &lt;/scope&gt;
      
              &lt;/block&gt;
            &lt;/case&gt;
          &lt;/switch&gt;
        &lt;/block&gt;
      &lt;/class-method&gt;
      

      </class>
      </unit>

      xslt:
      <!--
      This stylesheet optimizes (removes) dead code.

      $Id: java-optimize-unreacable-code.xslt 3996 2008-07-20 16:18:29Z vladimirn $
      -->
      <xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:xs="http://www.w3.org/2001/XMLSchema"
      xmlns:t="http://www.bphx.com/jxom"
      xmlns:p="http://www.bphx.com/jxom/private/optimizer"
      xmlns:j="http://www.bphx.com/java-1.5/2008-02-07"
      xmlns="http://www.bphx.com/java-1.5/2008-02-07"
      xpath-default-namespace="http://www.bphx.com/java-1.5/2008-02-07"
      exclude-result-prefixes="xs t j p">

      <xsl:output indent="yes"/>

      <!--
      The reachability rules, repeated here, are defined at:

      http://java.sun.com/docs/books/jls/second_edition/html/statements.doc.html#236365

      All smartness is removed for brievity.
      In short: 
        statement should be reachable;
        next statement is reachable if previous statement is reachable and completes normally.
      

      -->

      <!-- Entry point. -->
      <xsl:template match="/unit">
      <xsl:variable name="unit" as="element()" select="
      t:optimize-unreachable-code
      (
      .,
      false(),
      'NOTE: this is unreachable code.'
      )"/>

      &lt;!--
        Print warnings:
      --&gt;
      &lt;xsl:variable name=&quot;bookmarks&quot; as=&quot;element()*&quot;
        select=&quot;$unit//meta/bookmark&quot;/&gt;
      
      &lt;xsl:if test=&quot;exists($bookmarks)&quot;&gt;
        &lt;xsl:message&gt;
          &lt;xsl:for-each select=&quot;$bookmarks&quot;&gt;
            &lt;message path=&quot;...&quot;&gt;
              &lt;xsl:sequence select=&quot;@* | node()&quot;/&gt;
            &lt;/message&gt;
          &lt;/xsl:for-each&gt;
        &lt;/xsl:message&gt;
      &lt;/xsl:if&gt;
      
      &lt;xsl:sequence select=&quot;$unit&quot;/&gt;
      

      </xsl:template>

      <!--
      Collects unreachable statements for a statement.
      $statement - a scope statement.
      $statements-not-completing-normally - a statements that do
      not complete normally.
      Returns an optional sequence of unreachable statements.
      -->
      <xsl:template name="t:collect-unreachable-statements" as="element()*">
      <xsl:param name="statement" as="element()"/>

      &lt;xsl:param name=&quot;statements-not-completing-normally&quot; tunnel=&quot;yes&quot;
        as=&quot;element()*&quot;&gt;
        &lt;xsl:for-each select=&quot;$statement//j:*&quot;&gt;
          &lt;xsl:if test=&quot;xs:boolean(@remove-me)&quot;&gt;
            &lt;xsl:sequence select=&quot;.&quot;/&gt;
          &lt;/xsl:if&gt;
        &lt;/xsl:for-each&gt;
      &lt;/xsl:param&gt;
      
      &lt;xsl:variable name=&quot;statements&quot; as=&quot;element()*&quot; select=&quot;$statement/j:*&quot;/&gt;
      
      &lt;xsl:variable name=&quot;completes-normally&quot; as=&quot;xs:boolean*&quot; select=&quot;
        for $statement in $statements return
          empty($statement intersect $statements-not-completing-normally)&quot;/&gt;
      
      &lt;xsl:variable name=&quot;index&quot; as=&quot;xs:integer?&quot;
        select=&quot;index-of($completes-normally, false())[1]&quot;/&gt;
      
      &lt;xsl:for-each select=&quot;
        (
          if (exists($index)) then
            subsequence($statements, 1, $index)
          else
            $statements
        )/
          (
            self::block, 
            self::if/then, self::if/else,
            self::for/block,
            self::for-each/block,
            self::while/block,
            self::do-while/block,
            self::try/block, self::try/catch/block, self::try/finally,
            self::switch/case/block,
            self::synchronized/block
          )&quot;&gt;
        &lt;xsl:call-template name=&quot;t:collect-unreachable-statements&quot;&gt;
          &lt;xsl:with-param name=&quot;statement&quot; select=&quot;.&quot;/&gt;        
        &lt;/xsl:call-template&gt;
      &lt;/xsl:for-each&gt;
      
      &lt;xsl:if test=&quot;exists($index)&quot;&gt;
        &lt;xsl:sequence select=&quot;subsequence($statements, $index)&quot;/&gt;
      &lt;/xsl:if&gt;
      

      </xsl:template>

      <!--
      Optimizes unreachable code.
      $scope - an scope element to check.
      $remove - true to remove unreachable code, and false to comment it out.
      $comment - an optional comment text to put near unreachable code.
      Returns optimized scope element.
      -->
      <xsl:function name="t:optimize-unreachable-code" as="element()">
      <xsl:param name="scope" as="element()"/>
      <xsl:param name="remove" as="xs:boolean"/>
      <xsl:param name="comment" as="xs:string?"/>

      &lt;xsl:variable name=&quot;unreachable-statements&quot; as=&quot;element()*&quot;&gt;
        &lt;xsl:for-each select=&quot;
          $scope//(static/block, (method, class-method, enum-method)/block)&quot;&gt;
      
          &lt;xsl:call-template name=&quot;t:collect-unreachable-statements&quot;&gt;
            &lt;xsl:with-param name=&quot;statement&quot; select=&quot;.&quot;/&gt;
          &lt;/xsl:call-template&gt;
        &lt;/xsl:for-each&gt;
      &lt;/xsl:variable&gt;
      
      &lt;xsl:choose&gt;
        &lt;xsl:when test=&quot;empty($unreachable-statements)&quot;&gt;
          &lt;xsl:sequence select=&quot;$scope&quot;/&gt;
        &lt;/xsl:when&gt;
        &lt;xsl:otherwise&gt;
          &lt;xsl:apply-templates mode=&quot;p:optimize-unreachable-code&quot; select=&quot;$scope&quot;&gt;
            &lt;xsl:with-param name=&quot;unreachable-statements&quot; tunnel=&quot;yes&quot;
              select=&quot;$unreachable-statements&quot;/&gt;
            &lt;xsl:with-param name=&quot;remove&quot; tunnel=&quot;yes&quot; select=&quot;$remove&quot;/&gt;
            &lt;xsl:with-param name=&quot;comment&quot; tunnel=&quot;yes&quot; select=&quot;$comment&quot;/&gt;
          &lt;/xsl:apply-templates&gt;
        &lt;/xsl:otherwise&gt;
      &lt;/xsl:choose&gt;
      

      </xsl:function>

      <!--
      Mode "p:optimize-unreachable-code".
      Optimizes unreachable code.
      -->
      <xsl:template mode="p:optimize-unreachable-code"
      match="@* | text() | statement">
      <xsl:sequence select="."/>
      </xsl:template>

      <!--
      Mode "p:optimize-unreachable-code".
      Optimizes unreachable code.
      $unreachable-statements - a list of unreachable statements.
      $remove - true to remove unreachable code, and false to comment it out.
      $comment - an optional comment text to put near unreachable code.
      Returns optimized element.
      -->
      <xsl:template mode="p:optimize-unreachable-code" match="*">
      <xsl:param name="unreachable-statements" tunnel="yes" as="element()+"/>
      <xsl:param name="remove" tunnel="yes" as="xs:boolean"/>
      <xsl:param name="comment" tunnel="yes" as="xs:string?"/>

      &lt;xsl:variable name=&quot;statement&quot; as=&quot;element()&quot; select=&quot;.&quot;/&gt;
      
      &lt;xsl:choose&gt;
        &lt;xsl:when test=&quot;exists(. except $unreachable-statements)&quot;&gt;
          &lt;xsl:copy&gt;
            &lt;xsl:sequence select=&quot;@*&quot;/&gt;
            &lt;xsl:apply-templates mode=&quot;p:optimize-unreachable-code&quot;
              select=&quot;node()&quot;/&gt;
          &lt;/xsl:copy&gt;
        &lt;/xsl:when&gt;
        &lt;xsl:when test=&quot;
          not($remove) and empty((self::break, self::continue, self::return))&quot;&gt;
          &lt;scope&gt;
            &lt;comment&gt;
              &lt;xsl:if test=&quot;$comment&quot;&gt;
                &lt;para&gt;
                  &lt;xsl:sequence select=&quot;$comment&quot;/&gt;
                &lt;/para&gt;
              &lt;/xsl:if&gt;
              &lt;scope&gt;
                &lt;meta&gt;
                  &lt;bookmark&gt;
                    &lt;warning message=&quot;Unreachable code.&quot;/&gt;
                  &lt;/bookmark&gt;
                &lt;/meta&gt;
                &lt;xsl:sequence select=&quot;.&quot;/&gt;
              &lt;/scope&gt;
            &lt;/comment&gt;
          &lt;/scope&gt;
        &lt;/xsl:when&gt;
      &lt;/xsl:choose&gt;
      

      </xsl:template>

      </xsl:stylesheet>

       
    • Vladimir Nesterovsky

      Correction:

      xml:
      <unit xmlns="http://www.bphx.com/java-1.5/2008-02-07" package="com.test">
      <class access="public" name="MyClass">
      <class-method access="protected" name="run">
      <block>
      <switch>
      <test>
      <var name="x">
      <meta>
      <type name="int"/>
      </meta>
      </var>
      </test>
      <case>
      <value>
      <int>0</int>
      </value>
      <block>
      <expression>
      <assign>
      <var name="y">
      <meta>
      <type name="int"/>
      </meta>
      </var>
      <int>1</int>
      </assign>
      </expression>
      <return/>
      <break remove-me="true"/>
      </block>
      </case>
      <case>
      <block>
      <return/>

                &lt;expression  remove-me=&quot;true&quot;&gt;
                  &lt;inc&gt;
                    &lt;var name=&quot;y&quot;&gt;
                      &lt;meta&gt;
                        &lt;type name=&quot;int&quot;/&gt;
                      &lt;/meta&gt;
                    &lt;/var&gt;
                  &lt;/inc&gt;
                &lt;/expression&gt;
      
                &lt;break remove-me=&quot;true&quot;/&gt;
              &lt;/block&gt;
            &lt;/case&gt;
          &lt;/switch&gt;
        &lt;/block&gt;
      &lt;/class-method&gt;
      

      </class>
      </unit>

       
    • Michael Kay

      Michael Kay - 2008-08-25

      I think that Saxon's lazy evaluation strategy will handle this case quite efficiently. If I'm wrong, could you please try to explain how it could be improved?

      Michael Kay
      Saxonica

       
    • Vladimir Nesterovsky

      Another thing.

      If in the preceding xslt I define $statements-not-completing-normally in template t:collect-unreachable-statements like:

      &lt;xsl:param name=&quot;statements-not-completing-normally&quot; tunnel=&quot;yes&quot;
        as=&quot;element()*&quot;&gt;
        &lt;xsl:variable name=&quot;v&quot; as=&quot;element()*&quot;&gt;
          &lt;xsl:for-each select=&quot;$statement//j:*&quot;&gt;
            &lt;xsl:if test=&quot;xs:boolean(@remove-me)&quot;&gt;
              &lt;xsl:sequence select=&quot;.&quot;/&gt;
            &lt;/xsl:if&gt;
          &lt;/xsl:for-each&gt;
        &lt;/xsl:variable&gt;
      
        &lt;xsl:message select=&quot;$v&quot;/&gt;
        &lt;xsl:sequence select=&quot;$v&quot;/&gt;
      &lt;/xsl:param&gt;
      

      I was expecting this parameter to be evaluated once and be passed further (tunnel="yes"). This way, I was expecting recursive call to t:collect-unreachable-statements to recieve that parameter without evaluation.

      My expectation was that xsl:message to be reported once.
      That is not what I see.

      Am I wrong?

       
    • Vladimir Nesterovsky

      > I think that Saxon's lazy evaluation strategy will
      > handle this case quite efficiently.
      > If I'm wrong, could you please try to explain
      > how it could be improved?

      I can see that
      ". except $unreachable-statements"
      is compiled as VennExpression, which uses IntersectionEnumeration.

      In my case, due to size of sequence
      $unreachable-statements (10-30 on avarage),
      I guess that index lookup is more efficient
      than implementation, which I see in IntersectionEnumeration:

          while (nextNode1 != null &amp;&amp; nextNode2 != null) {
              int c = comparer.compare(nextNode1, nextNode2);
              if (c&lt;0) {
                  nextNode1 = next(e1);
              } else if (c&gt;0) {
                  nextNode2 = next(e2);
              } else {            // keys are equal
                  current = nextNode2;    // which is the same as nextNode1
                  nextNode2 = next(e2);
                  nextNode1 = next(e1);
                  position++;
                  return current;
              }
          }
      

      Thanks.

       
      • Vladimir Nesterovsky

        Correction. I was talking about expression:

        $statement intersect $statements-not-completing-normally

         
        • Michael Kay

          Michael Kay - 2008-08-26

          So do you think the following optimization would address your use case:

          Where the expression

          $a intersect $b

          is used in a boolean context, and the expression is evaluated in a loop relative to the initialization of one of $a or $b (call this the outer operand), then store a HashSet index for the outer operand and use this for a fast lookup when evaluating the expression?

          Currently if $b is the outer operand and $b has size N, while $a has size 1, this operation will typically be O(N). This optimization would change it to be O(log(N)) - plus a setup cost for creating the index.

          It's a fairly complex optimization to do because it involves new run-time data structures - unless I can contrive to use the existing code for value-based indexing, that is, rewrite it to

          exists($b[generate-id(.) = generate-id($a)])

          and use a value-based index on the generate-id() string. Which would be not quite as efficient, but possibly less special code.

          Incidentally, the answer to your other question is that if the default value for an xsl:param can be evaluated statically, then this is done, but if it has any dependency on the context then it is evaluated afresh on each call. (If you want to discuss this further, I would suggest using a separate thread).

           
          • Vladimir Nesterovsky

            My reasoning is like the following:

            for expressions
            $item except $collection
            and
            $item intersect $collection
            where expressions are used repeatedly for the same $collection and different $item, a "HASH" technique, storing $collection in some index, might be more efficient than, a "MERGE" technique, deriving results from two ordered sequences.

            If for the implementation of such an index a hash set is used, then the test time approaches to O(1), provided that hash code is "good" enough (plus setup time, which might also exist in "MERGE" techique to order input).

            The similar case is with expression
            index-of($collection, $item)

            > Incidentally, the answer to your other question is ...

            I'm satisfied with the answer received at xsl-list,
            however the decision as for default tunnel parameters
            seems nonintuitive to me.

            Thanks

             

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks