Small Device C Compiler (SDCC) / Patches / #361 New peephole optimiser constraint function: isImmdBitMask

Basil Hussain - 2021-01-20

By the way, I nearly forgot to mention, I was also going to create a patch to the manual with description of the new function, but I did not find a section with docs on the peephole constraint functions to be present.

I could have sworn someone (I do not remember who) on the dev mailing list mentioned a few months ago that they had added such information to the manual. Was it not merged?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2021-01-21
  
  There is a bit, but not much at the end of section 8.1.16.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Basil Hussain - 2021-01-21
    
    I found the mailing list post - it was Sergey Belyashov who had updated the docs. But it is in a separate 'doc-peeph' branch, so that is why I did not find it in the main branch.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2021-01-21

Thanks. Some remarks:
1) Would you mind making it work up to 64 bits (probably just using unsigned long long instead of unsigned long would do). It is not important now, but the restriction to 32 bits seems arbitrary.
2) The name is a bit non-descriptive. How about something like ImmdSingleBitSet / ImmdSingleBitUnset?
3) The description looks like a single set bit is treated the same as a single bit unset. Wouldn't Your example rule then incorrectly transform code that does |= 0xfe into code that does |= 0x01?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Basil Hussain - 2021-01-21
  
  For (1), yes, possibly. I only made it 32 bits because, IIRC, your previous comments on the dev mailing list indicated it would be highly unlikely more than 32 bits would be needed. I suppose it's just the val < (1 << width) test that needs changed - off the top of my head probably (unsigned long long)val < (1ULL << width) might do the trick.
  
  Regarding (3), you're right, that is a problem. I hadn't thought of that scenario. Perhaps instead I can split it into two functions: isImmdSingleBitSet and isImmdSingleBitClear. Probably both simply wrappers to a common implementation function.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Sebastian Riedel - 2021-03-30
    
    isImmdSingleBitSet and isImmdSingleBitClear sounds good.
    Is anything else holding this patch back?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Philipp Klaus Krause - 2021-03-30
      
      Yes.
      
      We had two competing patches for this feature submitted at the same time. I think I prefer the approach from patch [#362].
      
      Related
      
      Patches: ~~#362~~
      
      Last edit: Maarten Brock 2021-05-25
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Basil Hussain - 2021-01-22
  
  I've been attempting to make changes to handle 64 bits, but I've run into problems. I just don't seem to be able to convince the compiler to produce code that properly left-shifts an unsigned long long by 64 bits (for the val < (1 << width) test, which checks if the immediate value is no greater than the given bit width).
  
  I've tried:
  
  (unsigned long long)val < (1ULL << width)
  
  (unsigned long long)val < ((unsigned long long)1 << width)
  
  (unsigned long long)val < (1ULL << (unsigned long long)width)
  
  It seems that left-shifting by anything >= 64 gets turned into 1 << (width % 32).
  
  Anyone got any ideas?
  
  If I can't solve this problem, I'll just have to leave it limited to 32 bits.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-01-22
    
    I really don't get it, what is the point in checking for 64 bits if values gotten by immdGet are longs? and as such, are only guaranteed to be 32 bits?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Basil Hussain - 2021-01-22
      
      I don't think that assertion holds. Values parsed by immdGet() can be expressed in hexadecimal and so could be 0xFFFFFFFFFFFFFFFF. Even though that function returns a signed long, if you cast the value to unsigned long, it will represent the original intended value, no?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Visenri - 2021-01-22
        
        You can cast the returned value to whatever you want, but if the storage of an unsigned long holds only 32 bits, you'll never get anything bigger than that:
        For example:
        
        unsigned long lvalue = 0; unsigned long long llvalue= 0; lvalue = 0x22FFFFFFFF; printf("L: 0x%016llX \n" , lvalue) ; llvalue = 0x22FFFFFFFF; printf("LL: 0x%016llX \n" , llvalue) ; lvalue = 0; llvalue = 0x11111111FFFFFFFF; if (immdGet("0x33FFFFFFFF", &lvalue)) printf("L: 0x%016llX \n" , lvalue) ; if (immdGet("0x33FFFFFFFF", &llvalue)) printf("LL: 0x%016llX \n" , llvalue) ;
        
        Shows the following output in my computer:
        
        L: 0x00000000FFFFFFFF LL: 0x00000022FFFFFFFF L: 0x00000000FFFFFFFF LL: 0x11111111FFFFFFFF
        
        Assigning a value > 32bits is only stored in a variable of long long type, otherwise content is truncated.
        Even worse, because immdGet uses a pointer to set the value, it only sets the lower part, the higher 32bits remain at whatever they value was before, in this example that's why we get 0x11111111FFFFFFFF instead of 0x00000000FFFFFFFF.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

What a coincidence! I was just preparing the same thing. Because I just got tired of creating new rules using scripts for every bit.
But I implemented it as an extra operator for immdInRange to give it more versatility:

The given example would be

replace restart {
    ld  a, %1
    or  a, #%2
    ld  %1, a
} by {
    bset    %1, #%3
    ; peephole 18 replaced ld-or-ld by bset.
} if notUsed('a' 'n' 'z'), operandsLiteral(%1), immdInRange(0 7 'bp' %2 8 %3)

In my case the syntax is:

immdInRange(MIN MAX 'bp' VALUE BITS RESULT)
MIN, MAX: operate as usual.
'bp': is the new operator "bit position"
VALUE: is the value to be evaluated
BITS: number of bits for the mask (>0 for positive bits, < 0 for reversed bits).
returns false if VALUE is not a valid bit position (no matter what MIN and MAX are), if it's a valid bit position, then it checks MIN and MAX as usual.

It handles case 3 correctly, because the function checks normal or reversed bits as requested by BITS param.

The implementation looks like this:

/*-----------------------------------------------------------------*/
/* isPowerOfTwo - true if n is a power of 2                        */
/*-----------------------------------------------------------------*/
static bool
isPowerOfTwo(unsigned long n)
{  
  return (n != 0) && ((n & (n - 1)) == 0);
}  

/*-----------------------------------------------------------------*/
/* findBitPosition - Returns the bit position set or cleared in n  */
/* Parameters:                                                     */
/*  n: value to be tested.                                         */
/*  bits: number of positions to test.                             */
/*  complement: when true, the number must be bitwise complemented */
/* Returns:                                                        */
/*  -2 if bits is out of valid range (1..32)                       */
/*  -1 if n has more than one bit set or cleared                   */
/*  -1 if n has bits in positions over the bits param              */
/*  bit position (starting at 0) when only 1 bit set or clear      */
/* Examples:                                                       */
/*  n=0x02, bits=8, complemented=false -> 1                        */
/*  n=0x7F, bits=8, complemented=true  -> 7                        */
/*-----------------------------------------------------------------*/
static int
findBitPosition(unsigned long n, unsigned long bits, bool complement)
{
  unsigned long mask;
  int bitPos;

  if ((bits < 1) || (bits > 32)) //bits out of range?
    return -2;
  mask = (1ULL << bits) -1;
  if (n != (n & mask)) // bits outside mask?
    return -1;

  if (complement)
    n = (~n) & mask;
  if (!isPowerOfTwo (n)) // Not valid if more than one bit is set
    return -1;

  bitPos = -1;
  // One by one move the only set bit to right till it reaches end  
  while (n)
    {  
      n >>= 1;
      bitPos++; //count number of shifts
    }

  return bitPos;
}

So far, regression test pass successfully. And all my extra rules translate correcly into bset, bres, bcpl, btjt and btjf

Basil Hussain - 2021-01-22

Haha, interesting. :)

I did not go the route of doing something like adding to immdInRange, as I find that function fairly imperceptible/opaque already, and adding further complication wouldn't be a good thing in my mind. A separate function (or functions) is much clearer in my opinion.

What extra rules have you formulated? In particular, I have a whole bunch of rules for btjt and btjf (see feature request #690) for some) that I intend to modify to use this new function, which will be submitted once I've tested everything thoroughly.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

And is called like this from immdInRange:

  if (!j)
    for (k = 0; k < sizeof (pbitpos) / sizeof (pbitpos[0]); k++) // bitpos
      if (strcmp (operator, pbitpos[k]) == 0)
        {
          i = findBitPosition(left_l, abs(right_l), right_l < 0);
          if(i < -1 )
            return immdError ("bad right operand", operator, cmdLine);
          if(i < 0)
            return false;
          j = 1;
          break;

I am doing my final testing for this and other patches I've prepared, I'll upload them soon.

Philipp Klaus Krause - 2021-05-28

In [r12407], I applied peeph_stm8_def.patch. I modified it to use the singleBitSet and SingleBitReset conditions from patch [#362].

There is however, a small regression: the new rule 20 apparently doesn't apply in some cases where the old 20-? rules applied. This is probably due to the additional notUsed conditions. Since those conditions look more correct than the ones on the old 20-? rules, I applied that part anyway. I guess notUsed gives some false negatives here, and inted to look into it later.

Related

Patches: ~~#362~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Basil Hussain - 2021-05-28
  
  Hmm, interesting. I added the extra notUsed conditions because the original final LD would modify the N and Z flags, and I felt the previous rules were perhaps insufficiently strict enough not checking the state of the those flags weren't being later relied upon.
  
  I guess you're right that any such regression will be harmless and only result in rules not being applied when they otherwise could have been.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-05-28
  
  I think rule 21 is not correct, it is using "isImmdBitMask" instead of "immdInRange" (with "singleSetBit"):
  
  } if notUsed('a' 'n' 'z'), operandsLiteral(%1), isImmdBitMask(%2 8 %3)
  
  Should be:
  
  } if notUsed('a' 'n' 'z'), operandsLiteral(%1), immdInRange(0 7 'singleResetBit' %2 8 %3)
  
  That may also mean that the rule is never matched, because otherwise you should have seen an error with the regression tests.
  
  Last edit: Visenri 2021-05-28
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Philipp Klaus Krause - 2021-05-31
    
    Thanks. I fixed that now in [r12412]. Though indeed I don't get matches. For the following, only the |= gets optimized with current rules.
    
    void f(void) { *((unsigned char *)0x23) ^= 8; *((unsigned char *)0x24) &= 8; *((unsigned char *)0x25) |= 8; }
    
    Anyway, when I find time, I'll look through your rules in this patch to add them;, and sometime before 4.2.0, probably already this summer, there should be a rule cleanup, where rules that don't apply in the regression tests get removed.
    
    Last edit: Maarten Brock 2021-06-05
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Visenri - 2021-05-31
      
      This code shows one of the (multiple) problems I fixed long ago with my patch: [#290]
      
      Without that patch:
      
      ; ../src/test2.c: 46: *((unsigned char *)0x23) ^= 8; ld a, 0x0023 clrw x xor a, #0x08 ld 0x0023, a ; ../src/test2.c: 47: *((unsigned char *)0x24) &= 8; ld a, 0x0024 and a, #0x08 ld 0x0024, a ; ../src/test2.c: 51: *((unsigned char *)0x25) |= 8; bset 0x0025, #3 ; peephole 202x replaced 'or' by 'bset' ('0x0025').
      
      This is due to bad parsing of instructions like "ld a, 0x0024"
      The 'x' in the address is interpreted as reading X, so the unneeded "clrw x" is not removed and prevents the optimization of the first bcpl as follows:
      
      ; ../src/test2.c: 46: *((unsigned char *)0x23) ^= 8; ; peephole 3 removed dead clrw of x. bcpl 0x0023, #3 ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023'). ; ../src/test2.c: 47: *((unsigned char *)0x24) &= 8; ld a, 0x0024 and a, #0x08 ld 0x0024, a ; ../src/test2.c: 51: *((unsigned char *)0x25) |= 8; bset 0x0025, #3 ; peephole 202x replaced 'or' by 'bset' ('0x0025').
      
      Another proof:
      If you instead test this code (changed 'and' to a 'bres' like operation):
      
      *((unsigned char *)0x23) ^= 8; *((unsigned char *)0x24) &= ~8; *((unsigned char *)0x25) |= 8;
      
      This is the result:
      
      ; peephole 3 removed dead clrw of x. bcpl 0x0023, #3 ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023'). ; ../src/test2.c: 47: *((unsigned char *)0x24) &= ~8; bres 0x0024, #3 ; peephole 204x replaced 'and' by 'bres' ('0x0024'). ; ../src/test2.c: 51: *((unsigned char *)0x25) |= 8; bset 0x0025, #3 ; peephole 202x replaced 'or' by 'bset' ('0x0025').
      
      Now the first operation is optimized, because when the bres removes the " ld a, 0x0024", the bad parsing no longer happens and the "bcpl" optimization becomes possible.
      
      Related
      
      Patches: #290
      
      Last edit: Maarten Brock 2021-06-05
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Visenri - 2021-05-31
        
        This also happens with compiler symbols that contain 'x' or 'y':
        
        This gets full optimization:
        
        volatile unsigned char symbolVar; void f(void) { *((unsigned char *)0x23) ^= 8; symbolVar &= 8; *((unsigned char *)0x25) |= 8; }
        
        ; peephole 3 removed dead clrw of x. bcpl 0x0023, #3 ; peephole 200x replaced 'xor' by 'bcpl' ('0x0023'). ; ../src/test2.c: 51: symbolVar &= 8; ld a, _symbolVar+0 and a, #0x08 ld _symbolVar+0, a ; ../src/test2.c: 53: *((unsigned char *)0x25) |= 8; bset 0x0025, #3 ; peephole 202x replaced 'or' by 'bset' ('0x0025').
        
        However, the following code results in missing optimizations (only the variable name has been changed):
        
        volatile unsigned char symbolxVar; void f(void) { *((unsigned char *)0x23) ^= 8; symbolxVar &= 8; *((unsigned char *)0x25) |= 8; }
        
        ld a, 0x0023 clrw x xor a, #0x08 ld 0x0023, a ; ../src/test2.c: 51: symbolxVar &= 8; ld a, _symbolxVar+0 and a, #0x08 ld _symbolxVar+0, a ; ../src/test2.c: 53: *((unsigned char *)0x25) |= 8; bset 0x0025, #3 ; peephole 202x replaced 'or' by 'bset' ('0x0025').
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2021-05-28

status: open --> closed

assigned_to: Philipp Klaus Krause

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

New peephole optimiser constraint function: isImmdBitMask

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#361 New peephole optimiser constraint function: isImmdBitMask

Related

Discussion

Related

Related

Related