Menu

#361 New peephole optimiser constraint function: isImmdBitMask

None
closed
5
2021-05-31
2021-01-20
No

Here is a patch with a new peephole optimiser constraint function: isImmdBitMask.

This function determines whether an immediate value represents a bitmask with a single bit either set (e.g. 00010000) or clear (e.g. 11111011), and whose value is no wider than a given bit width (up to 32 bits). The function sets an output variable to the zero-based index of the single set/clear bit.

Some examples of the logic:

%1 = 0x4 => isImmdBitMask(%1 8 %2) => true, %2 = 2
%1 = 0xBF => isImmdBitMask(%1 8 %5) => true, %5 = 6
%2 = 0x200 => isImmdBitMask(%2 8 %3) => false
%2 = 0xD7 => isImmdBitMask(%2 8 %3) => false
%1 = 0x200 => isImmdBitMask(%1 16 %4) => true, %4 = 9
%1 = 0xEFFFFFFF => isImmdBitMask(%1 32 %2) => true, %2 = 28

The primary motivation behind this function is to streamline the unwieldy and repetitive sets of peephole optimiser rules for the STM8 platform that deal with manipulating or testing individual bits within registers or memory bytes. There may be future applications for other platforms too.

For example, the following rule (from stm8/peeph.def) is repeated 8 times with different bit mask values on the or instruction and different bit position literal on the bset:

replace restart {
    ld    a, %1
    or    a, #0x80
    ld    %1, a
} by {
    bset    %2, #7
    ; peephole 18-7 replaced or by bset.
} if notUsed('a'), operandsLiteral(%1), immdInRange(0 65535 '+' 0 %1 %2)

Having this new constraint function could cut these 8 down to a single rule:

replace restart {
    ld  a, %1
    or  a, #%2
    ld  %1, a
} by {
    bset    %1, #%3
    ; peephole 18 replaced ld-or-ld by bset.
} if notUsed('a' 'n' 'z'), operandsLiteral(%1), isImmdBitMask(%2 8 %3)

Also attached is a separate patch to stm8/peeph.def with modifications to use this new constraint function, which consolidates 32 rules down to only 4. A review of peephole rules for all other platforms did not reveal any other suitable candidates for modification.

2 Attachments

Related

Patches: #362

Discussion

  • Basil Hussain

    Basil Hussain - 2021-01-20

    By the way, I nearly forgot to mention, I was also going to create a patch to the manual with description of the new function, but I did not find a section with docs on the peephole constraint functions to be present.

    I could have sworn someone (I do not remember who) on the dev mailing list mentioned a few months ago that they had added such information to the manual. Was it not merged?

     
    • Philipp Klaus Krause

      There is a bit, but not much at the end of section 8.1.16.

       
      • Basil Hussain

        Basil Hussain - 2021-01-21

        I found the mailing list post - it was Sergey Belyashov who had updated the docs. But it is in a separate 'doc-peeph' branch, so that is why I did not find it in the main branch.

         
  • Philipp Klaus Krause

    Thanks. Some remarks:
    1) Would you mind making it work up to 64 bits (probably just using unsigned long long instead of unsigned long would do). It is not important now, but the restriction to 32 bits seems arbitrary.
    2) The name is a bit non-descriptive. How about something like ImmdSingleBitSet / ImmdSingleBitUnset?
    3) The description looks like a single set bit is treated the same as a single bit unset. Wouldn't Your example rule then incorrectly transform code that does |= 0xfe into code that does |= 0x01?

     
    • Basil Hussain

      Basil Hussain - 2021-01-21

      For (1), yes, possibly. I only made it 32 bits because, IIRC, your previous comments on the dev mailing list indicated it would be highly unlikely more than 32 bits would be needed. I suppose it's just the val < (1 << width) test that needs changed - off the top of my head probably (unsigned long long)val < (1ULL << width) might do the trick.

      Regarding (3), you're right, that is a problem. I hadn't thought of that scenario. Perhaps instead I can split it into two functions: isImmdSingleBitSet and isImmdSingleBitClear. Probably both simply wrappers to a common implementation function.

       
      • Sebastian Riedel

        isImmdSingleBitSet and isImmdSingleBitClear sounds good.
        Is anything else holding this patch back?

         
        • Philipp Klaus Krause

          Yes.

          We had two competing patches for this feature submitted at the same time. I think I prefer the approach from patch [#362].

           

          Related

          Patches: #362


          Last edit: Maarten Brock 2021-05-25
    • Basil Hussain

      Basil Hussain - 2021-01-22

      I've been attempting to make changes to handle 64 bits, but I've run into problems. I just don't seem to be able to convince the compiler to produce code that properly left-shifts an unsigned long long by 64 bits (for the val < (1 << width) test, which checks if the immediate value is no greater than the given bit width).

      I've tried:

      • (unsigned long long)val < (1ULL << width)
      • (unsigned long long)val < ((unsigned long long)1 << width)
      • (unsigned long long)val < (1ULL << (unsigned long long)width)

      It seems that left-shifting by anything >= 64 gets turned into 1 << (width % 32).

      Anyone got any ideas?

      If I can't solve this problem, I'll just have to leave it limited to 32 bits.

       
      • Visenri

        Visenri - 2021-01-22

        I really don't get it, what is the point in checking for 64 bits if values gotten by immdGet are longs? and as such, are only guaranteed to be 32 bits?

         
        • Basil Hussain

          Basil Hussain - 2021-01-22

          I don't think that assertion holds. Values parsed by immdGet() can be expressed in hexadecimal and so could be 0xFFFFFFFFFFFFFFFF. Even though that function returns a signed long, if you cast the value to unsigned long, it will represent the original intended value, no?

           
          • Visenri

            Visenri - 2021-01-22

            You can cast the returned value to whatever you want, but if the storage of an unsigned long holds only 32 bits, you'll never get anything bigger than that:
            For example:

                unsigned long lvalue = 0;
                unsigned long long llvalue= 0;
            
                lvalue = 0x22FFFFFFFF;
                printf("L:  0x%016llX \n" , lvalue) ;
            
                llvalue = 0x22FFFFFFFF;
                printf("LL: 0x%016llX \n" , llvalue) ;
            
                lvalue = 0;
                llvalue = 0x11111111FFFFFFFF;
            
                if (immdGet("0x33FFFFFFFF", &lvalue))
                  printf("L:  0x%016llX \n" , lvalue) ;
            
                if (immdGet("0x33FFFFFFFF", &llvalue))
                  printf("LL: 0x%016llX \n" , llvalue) ;
            

            Shows the following output in my computer:

            L:  0x00000000FFFFFFFF 
            LL: 0x00000022FFFFFFFF 
            L:  0x00000000FFFFFFFF 
            LL: 0x11111111FFFFFFFF 
            

            Assigning a value > 32bits is only stored in a variable of long long type, otherwise content is truncated.
            Even worse, because immdGet uses a pointer to set the value, it only sets the lower part, the higher 32bits remain at whatever they value was before, in this example that's why we get 0x11111111FFFFFFFF instead of 0x00000000FFFFFFFF.

             
  • Visenri

    Visenri - 2021-01-22

    What a coincidence! I was just preparing the same thing. Because I just got tired of creating new rules using scripts for every bit.
    But I implemented it as an extra operator for immdInRange to give it more versatility:

    The given example would be

    replace restart {
        ld  a, %1
        or  a, #%2
        ld  %1, a
    } by {
        bset    %1, #%3
        ; peephole 18 replaced ld-or-ld by bset.
    } if notUsed('a' 'n' 'z'), operandsLiteral(%1), immdInRange(0 7 'bp' %2 8 %3)
    

    In my case the syntax is:

    immdInRange(MIN MAX 'bp' VALUE BITS RESULT)
    MIN, MAX: operate as usual.
    'bp': is the new operator "bit position"
    VALUE: is the value to be evaluated
    BITS: number of bits for the mask (>0 for positive bits, < 0 for reversed bits).
    returns false if VALUE is not a valid bit position (no matter what MIN and MAX are), if it's a valid bit position, then it checks MIN and MAX as usual.

    It handles case 3 correctly, because the function checks normal or reversed bits as requested by BITS param.

    The implementation looks like this:

    /*-----------------------------------------------------------------*/
    /* isPowerOfTwo - true if n is a power of 2                        */
    /*-----------------------------------------------------------------*/
    static bool
    isPowerOfTwo(unsigned long n)
    {  
      return (n != 0) && ((n & (n - 1)) == 0);
    }  
    
    /*-----------------------------------------------------------------*/
    /* findBitPosition - Returns the bit position set or cleared in n  */
    /* Parameters:                                                     */
    /*  n: value to be tested.                                         */
    /*  bits: number of positions to test.                             */
    /*  complement: when true, the number must be bitwise complemented */
    /* Returns:                                                        */
    /*  -2 if bits is out of valid range (1..32)                       */
    /*  -1 if n has more than one bit set or cleared                   */
    /*  -1 if n has bits in positions over the bits param              */
    /*  bit position (starting at 0) when only 1 bit set or clear      */
    /* Examples:                                                       */
    /*  n=0x02, bits=8, complemented=false -> 1                        */
    /*  n=0x7F, bits=8, complemented=true  -> 7                        */
    /*-----------------------------------------------------------------*/
    static int
    findBitPosition(unsigned long n, unsigned long bits, bool complement)
    {
      unsigned long mask;
      int bitPos;
    
      if ((bits < 1) || (bits > 32)) //bits out of range?
        return -2;
      mask = (1ULL << bits) -1;
      if (n != (n & mask)) // bits outside mask?
        return -1;
    
      if (complement)
        n = (~n) & mask;
      if (!isPowerOfTwo (n)) // Not valid if more than one bit is set
        return -1;
    
      bitPos = -1;
      // One by one move the only set bit to right till it reaches end  
      while (n)
        {  
          n >>= 1;
          bitPos++; //count number of shifts
        }
    
      return bitPos;
    }
    

    So far, regression test pass successfully. And all my extra rules translate correcly into bset, bres, bcpl, btjt and btjf

     
    • Basil Hussain

      Basil Hussain - 2021-01-22

      Haha, interesting. :)

      I did not go the route of doing something like adding to immdInRange, as I find that function fairly imperceptible/opaque already, and adding further complication wouldn't be a good thing in my mind. A separate function (or functions) is much clearer in my opinion.

      What extra rules have you formulated? In particular, I have a whole bunch of rules for btjt and btjf (see feature request #690) for some) that I intend to modify to use this new function, which will be submitted once I've tested everything thoroughly.

       
  • Visenri

    Visenri - 2021-01-22

    And is called like this from immdInRange:

      if (!j)
        for (k = 0; k < sizeof (pbitpos) / sizeof (pbitpos[0]); k++) // bitpos
          if (strcmp (operator, pbitpos[k]) == 0)
            {
              i = findBitPosition(left_l, abs(right_l), right_l < 0);
              if(i < -1 )
                return immdError ("bad right operand", operator, cmdLine);
              if(i < 0)
                return false;
              j = 1;
              break;
    

    I am doing my final testing for this and other patches I've prepared, I'll upload them soon.

     
  • Philipp Klaus Krause

    In [r12407], I applied peeph_stm8_def.patch. I modified it to use the singleBitSet and SingleBitReset conditions from patch [#362].

    There is however, a small regression: the new rule 20 apparently doesn't apply in some cases where the old 20-? rules applied. This is probably due to the additional notUsed conditions. Since those conditions look more correct than the ones on the old 20-? rules, I applied that part anyway. I guess notUsed gives some false negatives here, and inted to look into it later.

     

    Related

    Patches: #362

    • Basil Hussain

      Basil Hussain - 2021-05-28

      Hmm, interesting. I added the extra notUsed conditions because the original final LD would modify the N and Z flags, and I felt the previous rules were perhaps insufficiently strict enough not checking the state of the those flags weren't being later relied upon.

      I guess you're right that any such regression will be harmless and only result in rules not being applied when they otherwise could have been.

       
    • Visenri

      Visenri - 2021-05-28

      I think rule 21 is not correct, it is using "isImmdBitMask" instead of "immdInRange" (with "singleSetBit"):

      } if notUsed('a' 'n' 'z'), operandsLiteral(%1), isImmdBitMask(%2 8 %3)
      

      Should be:

      } if notUsed('a' 'n' 'z'), operandsLiteral(%1), immdInRange(0 7 'singleResetBit' %2 8 %3)
      

      That may also mean that the rule is never matched, because otherwise you should have seen an error with the regression tests.

       

      Last edit: Visenri 2021-05-28
      • Philipp Klaus Krause

        Thanks. I fixed that now in [r12412]. Though indeed I don't get matches. For the following, only the |= gets optimized with current rules.

        void f(void)
        {
          *((unsigned char *)0x23) ^= 8;
          *((unsigned char *)0x24) &= 8;
          *((unsigned char *)0x25) |= 8;
        }
        

        Anyway, when I find time, I'll look through your rules in this patch to add them;, and sometime before 4.2.0, probably already this summer, there should be a rule cleanup, where rules that don't apply in the regression tests get removed.

         

        Last edit: Maarten Brock 2021-06-05
        • Visenri

          Visenri - 2021-05-31

          This code shows one of the (multiple) problems I fixed long ago with my patch: [#290]

          Without that patch:

          ;   ../src/test2.c: 46: *((unsigned char *)0x23) ^= 8;
              ld  a, 0x0023
              clrw    x
              xor a, #0x08
              ld  0x0023, a
          ;   ../src/test2.c: 47: *((unsigned char *)0x24) &= 8;
              ld  a, 0x0024
              and a, #0x08
              ld  0x0024, a
          ;   ../src/test2.c: 51: *((unsigned char *)0x25) |= 8;
              bset 0x0025, #3
          ; peephole 202x replaced 'or' by 'bset' ('0x0025').
          

          This is due to bad parsing of instructions like "ld a, 0x0024"
          The 'x' in the address is interpreted as reading X, so the unneeded "clrw x" is not removed and prevents the optimization of the first bcpl as follows:

          ;   ../src/test2.c: 46: *((unsigned char *)0x23) ^= 8;
          ; peephole 3 removed dead clrw of x.
              bcpl 0x0023, #3
          ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023').
          ;   ../src/test2.c: 47: *((unsigned char *)0x24) &= 8;
              ld  a, 0x0024
              and a, #0x08
              ld  0x0024, a
          ;   ../src/test2.c: 51: *((unsigned char *)0x25) |= 8;
              bset 0x0025, #3
          ; peephole 202x replaced 'or' by 'bset' ('0x0025').
          

          Another proof:
          If you instead test this code (changed 'and' to a 'bres' like operation):

            *((unsigned char *)0x23) ^= 8;
            *((unsigned char *)0x24) &= ~8;
            *((unsigned char *)0x25) |= 8;
          

          This is the result:

          ; peephole 3 removed dead clrw of x.
              bcpl 0x0023, #3
          ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023').
          ;   ../src/test2.c: 47: *((unsigned char *)0x24) &= ~8;
              bres 0x0024, #3
          ; peephole 204x replaced 'and' by 'bres' ('0x0024').
          ;   ../src/test2.c: 51: *((unsigned char *)0x25) |= 8;
              bset 0x0025, #3
          ; peephole 202x replaced 'or' by 'bset' ('0x0025').
          

          Now the first operation is optimized, because when the bres removes the " ld a, 0x0024", the bad parsing no longer happens and the "bcpl" optimization becomes possible.

           

          Related

          Patches: #290


          Last edit: Maarten Brock 2021-06-05
          • Visenri

            Visenri - 2021-05-31

            This also happens with compiler symbols that contain 'x' or 'y':

            This gets full optimization:

            volatile unsigned char symbolVar;
            void f(void)
            {
              *((unsigned char *)0x23) ^= 8;
              symbolVar &= 8;
              *((unsigned char *)0x25) |= 8;
            }
            
            ; peephole 3 removed dead clrw of x.
                bcpl 0x0023, #3
            ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023').
            ;   ../src/test2.c: 51: symbolVar &= 8;
                ld  a, _symbolVar+0
                and a, #0x08
                ld  _symbolVar+0, a
            ;   ../src/test2.c: 53: *((unsigned char *)0x25) |= 8;
                bset 0x0025, #3
            ; peephole 202x replaced 'or' by 'bset' ('0x0025').
            

            However, the following code results in missing optimizations (only the variable name has been changed):

            volatile unsigned char symbolxVar;
            void f(void)
            {
              *((unsigned char *)0x23) ^= 8;
              symbolxVar &= 8;
              *((unsigned char *)0x25) |= 8;
            }
            
                ld  a, 0x0023
                clrw    x
                xor a, #0x08
                ld  0x0023, a
            ;   ../src/test2.c: 51: symbolxVar &= 8;
                ld  a, _symbolxVar+0
                and a, #0x08
                ld  _symbolxVar+0, a
            ;   ../src/test2.c: 53: *((unsigned char *)0x25) |= 8;
                bset 0x0025, #3
            ; peephole 202x replaced 'or' by 'bset' ('0x0025').
            
             
  • Philipp Klaus Krause

    • status: open --> closed
    • assigned_to: Philipp Klaus Krause
    • Group: -->
     

Log in to post a comment.