Small Device C Compiler (SDCC) / Patches / #363 Implementation of SWAP, RRC and RLC for STM8

Visenri - 2021-02-04

Cases like these make much more sense:

Result in STACK
Left operand in XL:STACK
or
Result in STACK
Left operand in XL:YL

Do these cases happen? I think they should work with the code as it is.

But this will fail:
Result in STACK:XH
Left operand in XL:YL

Because XL will be considered not dead, X will be pushed.
X will be loaded with the left operand (but x_dead_global param will be true, possibly failing).
After that, the rotation will take place in X register.
And then X will be poped, discarting the result in X register!.

Last edit: Visenri 2021-02-04

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hello again.
I cleaned up the code, but my questions still remain unanswered:

1- Is the variable allocator free to store variables bigger than 2 bytes splitted in any possible combination between X, Y and stack?
For example using combinations like:
XH:YL
XL:YL
XL:STACK
If this is the case, more checks should to be done (at least to trigger an assertion).

2- Is there any easy way to convince the allocator to store the left operand in a specific register?
(so I can write test cases to test all code paths used in this function).

And now I have two more:

3-Is it redundant to do checks like these?:
!aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic)
I mean, when result is the same operand as reg_idx, will regDead return always "true"?

4-Is my code so ugly that no one thinks it deserves any answers :D?

I have working code for SWAP, RRC and RLC for 1,2 and 4 bytes (passes regression tests, but those do not cover all code flow of cases implemented).
This is just a snippet of the code used for Rotation of 2 bytes.

genRotate (const iCode *ic, asmop *result_aop, asmop *left_aop, int size, bool rotate_left)
{
  bool pushed_a = false, pushed_x = false, pushed_y  = false;

  D (emit2 (rotate_left ? "; genRLC" : "; genRRC", ""));

  switch (size)
    {
    case 1:
      ... Size 1 code, I think it is ok, because variable can only be in one place.
      break;
    case 2:
      {
        bool use_reg_y = false;
        asmop * reg_aop = ASMOP_X;
        short reg_idx = X_IDX;

        D (emit2 ("; genRotate2 default", ""));

        if (aopInReg (result_aop, 0, Y_IDX))
            use_reg_y = true;

        if (use_reg_y)
          {
            reg_aop = ASMOP_Y;
            reg_idx = Y_IDX;
          }

        if (!aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic))
          {
            push (reg_aop, 0, 2);
            if (reg_aop == ASMOP_X)
              pushed_x = true;
            else
              pushed_y = true;
          }

        genMove (reg_aop, left_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) || !use_reg_y, regDead (Y_IDX, ic) || use_reg_y);

        // Rotate
        emit3w (rotate_left ? A_SLLW : A_SRLW, reg_aop, 0);

        // Copy carry to the other side
        symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
        if (!regalloc_dry_run)
          emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
        cost (2, 1);

        if (rotate_left)
          emit3w (A_INCW, reg_aop, 0);
        else
          {
            emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop, 0));
            cost ((reg_idx == X_IDX) ? 3 : 4, 2);
          }

        emitLabel (tlbl);

        genMove (result_aop, reg_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) && use_reg_y, regDead (Y_IDX, ic) && !use_reg_y);
        break;
      }
      ...  case 4 code

  if (pushed_y)
    pop (ASMOP_Y, 0, 2);
  if (pushed_x)
    pop (ASMOP_X, 0, 2);
  if (pushed_a)
    pop (ASMOP_A, 0, 1);
}

Philipp Klaus Krause - 2021-02-12

The stm8 register allocator is free to do nearly anything. xh:yl, xl:stack, etc are surely possible.
However, you could disallow it from codegen for your cases. See lines 2204-2205 in gen.c for an example.

Not really. Doing some operations that are particularly fast via a specific register on the variable often works though.

A register that is overwritten by an iCode is considered dead at that iCode. The other reason for a register being dead is that nothing might be stored there.

I don't know, I haven't read the code yet. I was very busy and it didn't seem urgent (the urgent thing right now is the 4.1.0 release, which requires testing, finding and fixing bugs).

Last edit: Philipp Klaus Krause 2021-02-12
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-02-12

Thank you very much for the answers.

1- Pretty much confirms what I suspected. Just fake a super high cost in these cases hoping this will make the allocator think it's a bad idea, then add an assertion just in case it happens.
2- See below.
3- That was exactly my idea: So, I can remove some unneeded checks.
4- Excuse-me, It's just me being too impatient, sometimes I think I am talking alone here :D.

2- I can get easily the result in a register, but I haven't seen any way to get source operand in a register, at least for the operations tested (SWAP, RRC and RLC).
For example:
"source in stack" - "result in register"

unsigned char rotate_left_ref_char(unsigned char value) { return ((value << 1) | (value >> 7)); }

gives:

_rotate_left_ref_char: ; ../src/test4.c: 832: return ((value << 1) | (value >> 7)); ; genRLC ; genRotate1 default ld a, (0x03, sp) sll a adc a, #0x00 ; genReturn ; genLabel 00101$: ; ../src/test4.c: 833: } ; genEndFunction ret

I was hoping this one to be a "source in reg" to "result in reg", but nope, intermediate result is saved first in the stack, so we get exactly the same case here (from the genRLC point of view):

unsigned char rotate_left_ref_char(unsigned char value) { value+=2; return ((value << 1) | (value >> 7)); }

gives:

_rotate_left_ref_char: ; ../src/test4.c: 831: value+=2; ; genCast ; genAssign ld a, (0x03, sp) ; genPlus add a, #0x02 ld (0x03, sp), a ; ../src/test4.c: 832: return ((value << 1) | (value >> 7)); ; genRLC ; genRotate1 default ld a, (0x03, sp) sll a adc a, #0x00 ; genReturn ; genLabel 00101$: ; ../src/test4.c: 833: } ; genEndFunction ret

Last edit: Visenri 2021-02-13
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I think it is the way SDCC generates iCode that prevents it from allocating intermediate results in a register (at least sometimes):

For example:

Note: peephole optimizer disabled to better see the code output.

unsigned char test_combined_op(unsigned char value)
{
    value+=2;
    return value*2;
}

_test_combined_op:
;   ../src/test4.c: 837: value+=2;
; genCast
; genAssign
    ld  a, (0x03, sp)
; genPlus
    add a, #0x02
    ld  (0x03, sp), a
;   ../src/test4.c: 838: return value*2;
; genAssign
    ld  a, (0x03, sp)
; genLeftShiftLiteral
    sll a
; genReturn
; genLabel
00101$:
;   ../src/test4.c: 840: }
; genEndFunction
    ret

Two unneeded ld's to and from stack.

If we combine operations in the same statement, we get optimal result:

unsigned char test_combined_op(unsigned char value)
{
    return (value+2)*2;
}

_test_combined_op:
;   ../src/test4.c: 839: return (value+2)*2;
; genCast
; genAssign
    ld  a, (0x03, sp)
; genPlus
    add a, #0x02
; genCast
; genAssign
; genLeftShiftLiteral
    sll a
; genReturn
; genLabel
00101$:
;   ../src/test4.c: 840: }
; genEndFunction
    ret

Cosmic compiler has more or less the same behavior.

But IAR is able to combine operations in different statements:

unsigned char test_combined_op(unsigned char value)
{
    PC_ODR = value + 5; // Added to force value out of 'a' register (IAR calling convention has 'a' as byte argument)
    value+=4;
    return value*2;
}

     44          unsigned char test_combined_op(unsigned char value)
     45          {
   \                     test_combined_op:
   \   000000 B7 ..        LD        S:?b0, A
     46              PC_ODR = value + 5;
   \   000002 AB 05        ADD       A, #0x5
   \   000004 C7 500A      LD        L:0x500a, A
     47              value+=4;
     48              return value*2;
   \   000007 B6 ..        LD        A, S:?b0
   \   000009 AB 04        ADD       A, #0x4
   \   00000B 48           SLL       A
   \   00000C 81           RET
   \   00000D              REQUIRE _A_PC_ODR
     49          }

Using the same code for SDCC (even with peephole-opt enabled) results in:

_test_combined_op2:
;   ../src/test4.c: 844: PC_ODR = value + 5;
; genCast
; genAssign
    ld  a, (0x03, sp)
; genPlus
    ld  xl, a
; peephole 4 removed redundant load from xl into a.
    add a, #0x05
    ld  _PC_ODR+0, a
;   ../src/test4.c: 845: value+=4;
; genPlus
    ld  a, xl
    add a, #0x04
;   ../src/test4.c: 846: return value*2;
; genAssign
    ld  (0x03, sp), a
; peephole 4a removed redundant load from (0x03, sp) into a.
; genLeftShiftLiteral
    sll a
; genReturn
; genLabel
; peephole j30 removed unused label 00101$.
;   ../src/test4.c: 847: }
; genEndFunction
    ret

Unneeded ld's are still generated. One is removed by the peephole optimizer.

Philipp Klaus Krause - 2021-02-13

For your first test_combined_op, the register allocator should still allocate the intermediate result into a rather than on the stack. I'll try to find out why that doesn't happen sometime next week.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2021-02-13
  
  Looks like there is an issue with arguments. The register allocator never sees value, so it always stays on the stack where it was passed.
  However, if you first copy value to some other local variable, everything looks fine:
  
  unsigned char test_combined_op(unsigned char value) { unsigned char value2 = value; value2+=2; return value2*2; }
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

The case for SWAP and rotate the problem is a little worse, because the operation is not recognized once you add the previous operation in the same statement:

As shown before, this is fully optimized:

unsigned char rotate_left_ref_char(unsigned char value)
{
    return ((value << 1) | (value >> 7));
}

_rotate_left_ref_char:
;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
; genRLC
; genRotate1 default
    ld  a, (0x03, sp)
    sll a
    adc a, #0x00
; genReturn
; genLabel
00101$:
;   ../src/test4.c: 833: }
; genEndFunction
    ret

This is almost optimal:

unsigned char rotate_left_ref_char(unsigned char value)
{
    value+=2;
    return ((value << 1) | (value >> 7));
}

_rotate_left_ref_char:
;   ../src/test4.c: 831: value+=2;
; genCast
; genAssign
    ld  a, (0x03, sp)
; genPlus
    add a, #0x02
;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
; genRLC
; genRotate1 default
    ld  (0x03, sp), a
; peephole 4 removed redundant load from (0x03, sp) into a.
    sll a
    adc a, #0x00
; genReturn
; genLabel
; peephole j30 removed unused label 00101$.
;   ../src/test4.c: 833: }
; genEndFunction
    ret

but this is not recognized as rotation:

unsigned char rotate_left_ref_char(unsigned char value)
{
    return (((value+1) << 1) | ((value+1) >> 7));
}

rotate_left_ref_char:
;   ../src/test4.c: 832: return (((value+1) << 1) | ((value+1) >> 7));
; genCast
; genAssign
    ld  a, (0x03, sp)
; genPlus
    inc a
; genLeftShiftLiteral
    sll a
; genCast
; genAssign
    clrw    x
    exg a, xl
    ld  a, (0x03, sp)
    exg a, xl
; genPlus
    incw    x
; genRightShiftLiteral
    push    a
    clr a
    tnzw    x
    jrpl    00103$
    dec a
00103$:
    rrwa    x
    sll a
    rlcw    x
    pop a
; genCast
; genAssign
; genCast
; genAssign
; genOr
    pushw   x
    or  a, (2, sp)
    popw    x

I am not saying IAR is better here, because with my code for rotation and swap, SDCC can beat IAR, because IAR recognizes neither "rotation" nor "swap".

I am just talking about storage of intermediate results and its relation with combination of statements.

Philipp Klaus Krause - 2021-02-13

The interesting aspect in your last example is that one value + 1 is handled as an 8-bit type, while the other is handled as a 16-bit type.
This difference already happens in the AST, i.e. early in the frontend. Possibly some leftover optimization (IMO it would make more sense to do such optimizations later, on the iCode).

If hte return type is changed to unsigned int, both are considered to be the same, and value + 1 is calculated only once.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-02-13
  
  Casting the result of the +1 operation to unsigned char forces the +1 to be done only once.
  But swap is still not recognized (no matter how many extra casts I add):
  
  unsigned char swap_ref_char(unsigned char value) { unsigned char value2 = value; return (((unsigned char)(value2+1) << 4) | ((unsigned char)(value2+1) >> 4)); }
  
  swap_ref_char: push a ; ../src/test4.c: 806: unsigned char value2 = value; ; genAssign ld a, (0x04, sp) ; ../src/test4.c: 808: return (((unsigned char)(value2+1) << 4) | ((unsigned char)(value2+1) >> 4)); ; genCast ; genAssign ; genPlus inc a ; genCast ; genAssign ld xl, a ; genCast ; genAssign ld a, xl ; genLeftShiftLiteral swap a and a, #0xf0 ld (0x01, sp), a ; genRightShiftLiteral ld a, xl swap a and a, #0x0f ; genOr or a, (0x01, sp) ; genReturn ; genLabel 00101$: ; ../src/test4.c: 817: } ; genEndFunction addw sp, #1 ret
  
  But with "unsigned int" version all works as expected, one increment and swap operation (and it also works with or without intermediate variable):
  
  unsigned int swap_ref_int(unsigned int value) { unsigned int value2 = value; return (((value2+1) << 8) | ((value2+1) >> 8)); }
  
  _swap_ref_int: ; ../src/test4.c: 876: unsigned int value2 = value; ; genAssign ldw x, (0x03, sp) ; ../src/test4.c: 877: return (((value2+1) << 8) | ((value2+1) >> 8)); ; genPlus incw x ; genSwap ; genSwap2 v1 swapw x ; genReturn ; genLabel 00101$: ; ../src/test4.c: 889: } ; genEndFunction ret
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Philipp Klaus Krause - 2021-02-13
    
    Could you post the current version of your patch here sometime?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

I want to remove all the unneeded checks and add the code to disallow the values in split registers.
I'll retest all and then I'll post it here if you want, but I have also to update my current version to the newest one, so my diffs will but up to date.
I hope today I'll be able to upload something.

If you want to have a look, this is what I have right now:

/*-----------------------------------------------------------------*/
/* genSwap - generates code to swap nibbles, bytes or words        */
/*-----------------------------------------------------------------*/
static void
genSwap (const iCode *ic, asmop *result_aop, asmop *left_aop, int size)
{
  bool done = false, pushed_a = false, pushed_x = false, pushed_y  = false;

  D (emit2 ("; genSwap", ""));

  switch (size)
    {
    case 1:
      if (aopSame (result_aop, 0, left_aop, 0, 1) &&
          (aopInReg (result_aop, 0, A_IDX) || aopOnStack (result_aop, 0, 1) || !(aopRS (result_aop))))
        {
          // SWAP with 'a' or any other non reg operand can be done directly with the operand
          D (emit2 ("; genSwap1 v1", ""));
          emit3 (A_SWAP, result_aop, 0);
        }
      else
        {
          D (emit2 ("; genSwap1 default", ""));
          if (!aopInReg (result_aop, 0, A_IDX) && !regDead (A_IDX, ic))
            {
              push (ASMOP_A, 0, 1);
              pushed_a = true;
            }

          cheapMove (ASMOP_A, 0, left_aop, 0, false);
          emit3 (A_SWAP, ASMOP_A, 0);
          cheapMove (result_aop, 0, ASMOP_A, 0, false);
        }
      break;
    case 2:     
      if (aopSame (result_aop, 0, left_aop, 0, 2))
        {
          if (aopInReg (result_aop, 0, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
            {
              D (emit2 ("; genSwap2 v1", ""));
              emit3w (A_SWAPW, result_aop, 0);
              done = true;
            }
          else if (regDead (A_IDX, ic) && result_aop->type == AOP_DIR )
            {
                // if 'a' is available and the operand is direct, do the swap using exg
                D (emit2 ("; genSwap2 v2", ""));
                cheapMove (ASMOP_A, 0, result_aop, 0, false);
                emit3_o (A_EXG, ASMOP_A, 0, result_aop, 1);
                cheapMove (result_aop, 0, ASMOP_A, 0, false);
                done = true;
            }
        }

      if(!done)
        {
          D (emit2 ("; genSwap2 default", ""));
          if (!aopInReg (result_aop, 0, X_IDX) && !regDead (X_IDX, ic))
            {
              push (ASMOP_X, 0, 2);
              pushed_x = true;
            }

          genMove (ASMOP_X, left_aop, regDead (A_IDX, ic), true, regDead (Y_IDX, ic));
          emit3w (A_SWAPW, ASMOP_X, 0);
          genMove (result_aop, ASMOP_X, regDead (A_IDX, ic), false, regDead (Y_IDX, ic));
        }
      break;
    case 4:
      if (aopSame (result_aop, 0, left_aop, 0, 4))
        {
          if (aopInReg (result_aop, 0, X_IDX) && aopInReg (result_aop, 2, Y_IDX) ||
              aopInReg (result_aop, 0, Y_IDX) && aopInReg (result_aop, 2, X_IDX))
            {
              D (emit2 ("; genSwap4 v1", ""));
              emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
              done = true;
            }
        }
      else
        {
          if ((aopInReg (result_aop, 0, X_IDX) && aopInReg (result_aop, 2, Y_IDX) &&
               aopInReg (left_aop,   0, Y_IDX) && aopInReg (left_aop,   2, X_IDX))
              ||
              (aopInReg (result_aop, 0, Y_IDX) && aopInReg (result_aop, 2, X_IDX) &&
               aopInReg (left_aop,   0, X_IDX) && aopInReg (left_aop,   2, Y_IDX)))
            {
              D (emit2 ("; genSwap4 v2", ""));
              // No need to do anything, the result is already a swapped version of the input
              done = true;
            }
        }

      if (!done)
        {
          asmop * reg_aop0 = ASMOP_X;
          asmop * reg_aop1 = ASMOP_Y;
          short reg_idx0 = X_IDX;
          short reg_idx1 = Y_IDX;
          short reg_idx_left_0, reg_idx_left_2, reg_idx_result_0, reg_idx_result_2;
          bool use_exg = false;


          if (aopInReg (result_aop, 2, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
            {
                D (emit2 ("; genSwap4 reversed", ""));
                reg_aop0 = ASMOP_Y;
                reg_aop1 = ASMOP_X;
                reg_idx0 = Y_IDX;
                reg_idx1 = X_IDX;
            }
          else
            D (emit2 ("; genSwap4 default", ""));

          #define AOP_GET_REG_IDX(aop, offset) aopInReg (aop, offset, X_IDX) ? X_IDX : (aopInReg (aop, offset, Y_IDX) ?  Y_IDX : -1)

          reg_idx_left_0   = AOP_GET_REG_IDX(left_aop,   0);
          reg_idx_left_2   = AOP_GET_REG_IDX(left_aop,   2);
          reg_idx_result_0 = AOP_GET_REG_IDX(result_aop, 0);
          reg_idx_result_2 = AOP_GET_REG_IDX(result_aop, 2);

          #undef AOP_GET_REG_IDX

          if ((reg_idx_left_0 != -1 && reg_idx_left_0 == reg_idx_result_0) ||
              (reg_idx_left_2 != -1 && reg_idx_left_2 == reg_idx_result_2))
            use_exg = true; // if any register from left are the same for the result use copy to reg and swap


          if(!use_exg)
            {
              bool copy_offset_0 = !aopSame (result_aop, 2, left_aop, 2, 2); // copy offset 0 first if it will not ovewrite left operand

              D (emit2 ("; genSwap4 copy", ""));

              for (int i = 0; i < 2; i++)
                {
                  short reg_idx = copy_offset_0 ? reg_idx_result_0 : reg_idx_result_2;
                  bool reg_dead_x = regDead (X_IDX, ic);
                  bool reg_dead_y = regDead (Y_IDX, ic);

                  reg_dead_x = reg_dead_x && reg_idx_left_0 != X_IDX && reg_idx_left_2 != X_IDX && (reg_idx_result_0 != X_IDX && reg_idx_result_2 != X_IDX || i == 0);
                  reg_dead_y = reg_dead_y && reg_idx_left_0 != Y_IDX && reg_idx_left_2 != Y_IDX && (reg_idx_result_0 != Y_IDX && reg_idx_result_2 != Y_IDX || i == 0);

                  reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                  reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);

                  if(copy_offset_0)
                    genMove_o (result_aop, 2, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                  else
                    genMove_o (result_aop, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);

                  copy_offset_0 = !copy_offset_0; // next time copy the other offset
                }
            }
          else
            {
              bool copy_offset_0 = !aopInReg (left_aop, 2, reg_idx0); // copy offset 0 first if it will not ovewrite left operand with offset 2

              D (emit2 ("; genSwap4 swap", ""));

              if (reg_idx_result_0 != X_IDX && reg_idx_result_2 != X_IDX && !regDead (X_IDX, ic))
                {
                  push (ASMOP_X, 0, 2);
                  pushed_x = true;
                }
              if (reg_idx_result_0 != Y_IDX && reg_idx_result_2 != Y_IDX && !regDead (Y_IDX, ic))
                {
                  push (ASMOP_Y, 0, 2);
                  pushed_y = true;
                }

              for (int i = 0; i < 2; i++)
                {
                  bool reg_dead_x = i == 0 && reg_idx_left_0 != X_IDX && reg_idx_left_2 != X_IDX;
                  bool reg_dead_y = i == 0 && reg_idx_left_0 != Y_IDX && reg_idx_left_2 != Y_IDX;
                  short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;

                  reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                  reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);

                  if(copy_offset_0)
                    genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                  else
                    genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    

                  copy_offset_0 = !copy_offset_0; // next time copy the other offset
                }
              // Swap
              emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
            }

          genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
          genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
        }
      break;
    default:
      wassertl (false, "unsupported SWAP operand size");
    }

  if (pushed_y)
    pop (ASMOP_Y, 0, 2);
  if (pushed_x)
    pop (ASMOP_X, 0, 2);
  if (pushed_a)
    pop (ASMOP_A, 0, 1);
}

This is for rotation:

/*-----------------------------------------------------------------*/
/* genRotate - generates code for rotations (RLC & RRC)            */
/*-----------------------------------------------------------------*/
static void
genRotate (const iCode *ic, asmop *result_aop, asmop *left_aop, int size, bool rotate_left)
{
  bool pushed_a = false, pushed_x = false, pushed_y  = false;

  D (emit2 (rotate_left ? "; genRLC" : "; genRRC", ""));

  switch (size)
    {
    case 1:
      {
        bool use_result_directly = false;
        asmop *rotate_aop = ASMOP_A;

        if (aopSame (result_aop, 0, left_aop, 0, 1) &&
          (aopInReg (result_aop, 0, A_IDX) || rotate_left && (aopOnStack (result_aop, 0, 1) || !aopRS (result_aop))))
          {
            use_result_directly = true;
            // RLC with 'a' or any other non reg operand can be done directly with the operand
            // RLC & RRC with "a" can be done directly
            D (emit2 ("; genRotate1 v1", ""));
          }
        else
          D (emit2 ("; genRotate1 default", ""));

        symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));

        if(use_result_directly)
          rotate_aop = result_aop;

        if (!use_result_directly && !aopInReg (result_aop, 0, A_IDX) && !regDead (A_IDX, ic))
          {
            push (ASMOP_A, 0, 1);
            pushed_a = true;
          }

        if (!use_result_directly)
          cheapMove (ASMOP_A, 0, left_aop, 0, false);

        // Rotate
        emit3 (rotate_left ? A_SLL : A_SRL, rotate_aop, 0);

        // Copy carry to the other side
        if (rotate_left && aopSame (rotate_aop, 0, ASMOP_A, 0, 1))
          emit3 (A_ADC, ASMOP_A, ASMOP_ZERO);
        else
          {
            if (!regalloc_dry_run)
              emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
            cost (2, 1);

            if (rotate_left)
              emit3 (A_INC, rotate_aop, 0);
            else
              {
                if(!aopInReg (rotate_aop, 0, A_IDX))
                  wassertl (FALSE, "Tried to use 'or' with non 'a' operand");
                emit2 ("or", "a, #0x80");
                cost (2, 1);
              }
            emitLabel (tlbl);
          }

        if (!use_result_directly)
          cheapMove (result_aop, 0, ASMOP_A, 0, false);
      break;
      }
    case 2:
      {
        bool use_reg_y = false;
        asmop * reg_aop = ASMOP_X;
        short reg_idx = X_IDX;

        D (emit2 ("; genRotate2 default", ""));

        if (aopInReg (result_aop, 0, Y_IDX))
            use_reg_y = true;

        if (use_reg_y)
          {
            reg_aop = ASMOP_Y;
            reg_idx = Y_IDX;
          }

        if (!aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic))
          {
            push (reg_aop, 0, 2);
            if (reg_aop == ASMOP_X)
              pushed_x = true;
            else
              pushed_y = true;
          }

        genMove (reg_aop, left_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) || !use_reg_y, regDead (Y_IDX, ic) || use_reg_y);

        // Rotate
        emit3w (rotate_left ? A_SLLW : A_SRLW, reg_aop, 0);

        // Copy carry to the other side
        symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
        if (!regalloc_dry_run)
          emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
        cost (2, 1);

        if (rotate_left)
          emit3w (A_INCW, reg_aop, 0);
        else
          {
            emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop, 0));
            cost ((reg_idx == X_IDX) ? 3 : 4, 2);
          }

        emitLabel (tlbl);

        genMove (result_aop, reg_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) && use_reg_y, regDead (Y_IDX, ic) && !use_reg_y);
        break;
      }
    case 4:
      {
        asmop * reg_aop0 = ASMOP_X;
        asmop * reg_aop1 = ASMOP_Y;
        short reg_idx0 = X_IDX;
        short reg_idx1 = Y_IDX;

        if (aopInReg (result_aop, 2, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
          {
              D (emit2 ("; genRotate4 reversed", ""));
              reg_aop0 = ASMOP_Y;
              reg_aop1 = ASMOP_X;
              reg_idx0 = Y_IDX;
              reg_idx1 = X_IDX;
          }
        else
          D (emit2 ("; genRotate4 default", ""));

        if (!aopInReg (result_aop, 0, X_IDX) && !aopInReg (result_aop, 2, X_IDX) && !regDead (X_IDX, ic))
          {
            push (ASMOP_X, 0, 2);
            pushed_x = true;
          }
        if (!aopInReg (result_aop, 0, Y_IDX) && !aopInReg (result_aop, 2, Y_IDX) && !regDead (Y_IDX, ic))
          {
            push (ASMOP_Y, 0, 2);
            pushed_y = true;
          }

        // Copy data into X:Y registers, use EXGW if data is already in X:Y in reverse order
        if(aopInReg (left_aop, 2, reg_idx0) && aopInReg (left_aop, 0, reg_idx1))
          emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
        else
          {
            bool copy_offset_0 = !aopInReg (left_aop, 2, reg_idx0); // copy offset 0 first if it will not ovewrite left operand with offset 2
            for (int i = 0; i < 2; i++)
              {
                bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;

                reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);

                if(copy_offset_0)
                  genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                else
                  genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    

                copy_offset_0 = !copy_offset_0; // next time copy the other offset
              }
          }

        // Rotate
        if (rotate_left)
          {
            emit3w (A_SLLW, reg_aop0, 0);
            emit3w (A_RLCW, reg_aop1, 0);
          }
        else
          {
            emit3w (A_SRLW, reg_aop1, 0);
            emit3w (A_RRCW, reg_aop0, 0);
          }

        // Copy carry to the other side
        symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
        if (!regalloc_dry_run)
          emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
        cost (2, 1);

        if (rotate_left)
          emit3w (A_INCW, reg_aop0, 0);
        else
          {
            emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop1, 0));
            cost ((reg_idx1 == X_IDX) ? 3 : 4, 2);
          }

        emitLabel (tlbl);

        genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
        genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
      }
      break;
    default:
      wassertl (false, "unsupported RLC/RRC operand size");
  }

  if (pushed_y)
    pop (ASMOP_Y, 0, 2);
  if (pushed_x)
    pop (ASMOP_X, 0, 2);
  if (pushed_a)
    pop (ASMOP_A, 0, 1);
}

And this is the calling function for both:

static void
genRotateOperation (const iCode *ic)
{
  int size = getSize (operandType (IC_RESULT(ic)));

  operand *result = IC_RESULT (ic);
  operand *left = IC_LEFT (ic);

  aopOp (left, ic);
  aopOp (result, ic);

  asmop *result_aop = result->aop;
  asmop *left_aop = left->aop;

  switch (ic->op)
    {
    case RRC:
    case RLC:
      genRotate (ic, result_aop, left_aop, size, ic->op == RLC);
      break;

    case SWAP:
      genSwap (ic, result_aop, left_aop, size);
      break;
    default:
      wassertl (0, "Unimplemented iCode");
    }

  freeAsmop (left);
  freeAsmop (result);
}

Called in genSTM8iCode

    case RRC:
    case RLC:
    case SWAP:
      genRotateOperation (ic);
      break;

And modification to stm8/main.c

/* Indicate which extended bit operations this port supports */
static bool
hasExtBitOp (int op, int size)
{
  return (op == GETABIT)
          || (op == RRC  && size <= 4 && size != 3)
          || (op == RLC  && size <= 4 && size != 3)
          || (op == SWAP && size <= 4 && size != 3);
}

Visenri - 2021-02-13

I also added more instructions, not implemented in
emit3cost:
A_EXG
emit3wcost:
A_EXGW
A_SWAPW

And changed places where this was hardcoded.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

After writing some test cases I am getting failures because of operands splitted in registers even when I am using the trick suggested ( cost (1000, 1000) in forbidden cases).

Rotation with 4 bytes is the one triggering the failures ( assertions have been commented to see the end result):

    case 4:
      {
        asmop * reg_aop0 = ASMOP_X;
        asmop * reg_aop1 = ASMOP_Y;
        short reg_idx0 = X_IDX;
        short reg_idx1 = Y_IDX;

        if (regBytesUsedByAop (result_aop, 2, 2, X_IDX) || regBytesUsedByAop (result_aop, 0, 2, Y_IDX))
          {
              D (emit2 ("; genRotate4 reversed", ""));
              reg_aop0 = ASMOP_Y;
              reg_aop1 = ASMOP_X;
              reg_idx0 = Y_IDX;
              reg_idx1 = X_IDX;
          }
        else
          D (emit2 ("; genRotate4 default", ""));

        if (regUsedPartiallyByAop (left_aop, 0, 4, X_IDX) &&  regUsedPartiallyByAop (result_aop, 0, 4, X_IDX) ||
            regUsedPartiallyByAop (left_aop, 0, 4, Y_IDX) &&  regUsedPartiallyByAop (result_aop, 0, 4, Y_IDX))
          { // This cases are not implemented, so they are forbidden
            cost (1000, 1000);
            //wassert_bt (regalloc_dry_run);
            //return;
            D (emit2 ("; genRotate4 problem", ""));
          }

        if (!regDead (X_IDX, ic))
          {
            push (ASMOP_X, 0, 2);
            pushed_x = true;
          }
        if (!regDead (Y_IDX, ic))
          {
            push (ASMOP_Y, 0, 2);
            pushed_y = true;
          }

        // Copy data into X:Y registers, use EXGW if data is already in X:Y in reverse order
        if(aopInReg (left_aop, 2, reg_idx0) && aopInReg (left_aop, 0, reg_idx1))
          emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
        else
          {
            bool copy_offset_0 =  !regBytesUsedByAop (left_aop, 2, 2, reg_idx0); //copy offset 0 first if it will not ovewrite left operand with offset 2

            if (!copy_offset_0 && !regBytesUsedByAop (left_aop, 0, 2, reg_idx1)) //if reversed order also overwrites we have a problem
              { // This case is not implemented, so it's forbidden
                cost (1000, 1000);
                //wassert_bt (regalloc_dry_run);
                //return;
                D (emit2 ("; genRotate4 problem2", ""));
                D (emit2 (";", "Left : %s, %s, %s, %s", aopGet (left_aop, 3), aopGet (left_aop, 2), aopGet (left_aop, 1), aopGet (left_aop, 0)));
                D (emit2 (";", "Result : %s, %s, %s, %s", aopGet (result_aop, 3), aopGet (result_aop, 2), aopGet (result_aop, 1), aopGet (result_aop, 0)));
                D (emit2 (";", "Regs : %s, %s", aopGet2 (reg_aop1, 0), aopGet2 (reg_aop0, 0)));
              }

            for (int i = 0; i < 2; i++)
              {
                bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;

                reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);

                if(copy_offset_0)
                  genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                else
                  genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    

                copy_offset_0 = !copy_offset_0; // next time copy the other offset
              }
          }

        // Rotate
        if (rotate_left)
          {
            emit3w (A_SLLW, reg_aop0, 0);
            emit3w (A_RLCW, reg_aop1, 0);
          }
        else
          {
            emit3w (A_SRLW, reg_aop1, 0);
            emit3w (A_RRCW, reg_aop0, 0);
          }

        // Copy carry to the other side
        symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
        if (!regalloc_dry_run)
          emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
        cost (2, 1);

        if (rotate_left)
          emit3w (A_INCW, reg_aop0, 0);
        else
          {
            emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop1, 0));
            cost ((reg_idx1 == X_IDX) ? 3 : 4, 2);
          }

        emitLabel (tlbl);

        genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
        genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
      }
      break;

I keep getting results like this:
; Left : a, xl, xh, (0x04, sp)
; Result : yh, a, xh, xl

_rotate_test_1_xor1:
    sub sp, #4
;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 96: value = value ^ rotate_test_value_xor;
; genXor
    ld  a, (0x0a, sp)
    xor a, _rotate_test_value_xor+3
    ld  (0x04, sp), a
    ld  a, (0x09, sp)
    xor a, _rotate_test_value_xor+2
    ld  xh, a
    ld  a, (0x08, sp)
    xor a, _rotate_test_value_xor+1
    ld  xl, a
    ld  a, (0x07, sp)
    xor a, _rotate_test_value_xor+0
;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 97: return ((value << SHIFT_L) | (value >> SHIFT_R)) AND_OPERATION;
; genRRC
; genRotate4 default
; genRotate4 problem2
;   Left : a, xl, xh, (0x04, sp)
;   Result : yh, a, xh, xl
;   Regs : y, x
    exg a, yl
    ld  a, xl
    exg a, yl
    ld  yh, a
    ld  a, (0x04, sp)
    ld  xl, a
    srlw    y
    rrcw    x
    jrnc    00103$
    addw    y , #0x8000
00103$:
    ld  a, yl
; genAnd
    and a, #0xef
    ld  yl, a
    ld  a, xl
    and a, #0xef
    ld  xl, a
; genReturn
; genLabel
; peephole j30 removed unused label 00101$.
;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 98: }
; genEndFunction
    addw    sp, #4
    ret

Attached my "stm8/gen.c" file, the test cases code in "rotate2.c" and the results in "results-test-stm8.log".

Last edit: Visenri 2021-02-14

gen.c

results-test-stm8.log

rotate2.c

I tried changing the code to do the whole copy using only one call to genMove_o, but it results in an assertion failure in genMove_o ("Unimplemented."):

genMove_o (ASMOP_XY, 0, left_aop, 0, 4, regDead (A_IDX, ic), regDead (X_IDX, ic), regDead (Y_IDX, ic));

            if (!copy_offset_0 && !regBytesUsedByAop (left_aop, 0, 2, reg_idx1)) //if reversed order also overwrites we have a problem
              { // This case is not implemented, so it's forbidden
                cost (1000, 1000);
                //wassert_bt (regalloc_dry_run);
                //return;
                D (emit2 ("; genRotate4 problem2", ""));
                D (emit2 (";", "Left : %s, %s, %s, %s", aopGet (left_aop, 3), aopGet (left_aop, 2), aopGet (left_aop, 1), aopGet (left_aop, 0)));
                D (emit2 (";", "Result : %s, %s, %s, %s", aopGet (result_aop, 3), aopGet (result_aop, 2), aopGet (result_aop, 1), aopGet (result_aop, 0)));
                D (emit2 (";", "Regs : %s, %s", aopGet2 (reg_aop1, 0), aopGet2 (reg_aop0, 0)));
              }
            if (reg_idx0 == X_IDX && reg_idx1 == Y_IDX)
              {
                  genMove_o (ASMOP_XY, 0, left_aop, 0, 4, regDead (A_IDX, ic), regDead (X_IDX, ic), regDead (Y_IDX, ic));
              }
            else
              {
                for (int i = 0; i < 2; i++)
                  {
                    bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                    bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                    short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;

                    reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                    reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);

                    if(copy_offset_0)
                      genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                    else
                      genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);

                    copy_offset_0 = !copy_offset_0; // next time copy the other offset
                  }
              }

gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c:114: error 9: FATAL Compiler Internal Error in file 'gen.c' line number '2138' : Unimplemented.

Corresponding to these lines:

      // No byte can be assigned safely (i.e. the assignment is a permutation).
      if (!regalloc_dry_run)
        wassertl_bt (0, "Unimplemented.");

Visenri - 2021-02-15

After more testing, this is the problematic case that triggers the assertion in genMove_o ("Unimplemented."):

The rotate function input asked for these operands:

; Left : a, xl, xh, yl ; Result : yh, a, xh, xl

The rotate function calls genMove_o with the closest combination of X and Y (to do the rotation), in this case Y:X:

; Left : a, xl, xh, yl ; Result : yh, yl, xh, xl

The copy fails because it contains a permutation:
XH is already in place
A is copied into YH
the remaining YL-XL form a permutation, so none can be copied without overwriting the source data for the other.

As a proof of concept I've modified the code of genMove_o to check if any other reg can be used to mirror the value of YL, so the copy won't end in a dead lock.
It uses YH and the test compiles and passes the regression test.

The copy suceeds because it's no longer a permutation:
XH is already in place
YL is copied into YH (temp copy of YL, banned from being written until used)
XL is copied into YL
YH is copied into XL (YH write is enabled after being used)
A is copied into YH

The general concept would be:
If the copy is not doable, try it again using a free register as temporary storage (one free or used in the result and not involved in the permutation).

Last edit: Visenri 2021-02-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-02-16
  
  For simple cases where A is not involved in the permutation (like this one) even an easier approach would be to use A to perform it.
  
  I have tested something like this:
  
  push (ASMOP_A, 0, 1); cheapMove (ASMOP_A, 0, source, soffset0, false); cheapMove (source, soffset0, source, soffset1, true); cheapMove (source, soffset1, ASMOP_A, 0, false); pop (ASMOP_A, 0, 1);
  
  where soffset0 and soffset1 represent the offsets of the registers to be exchanged.
  
  gives:
  
  ld yh, a push a ld a, yl exg a, yl ld a, xl exg a, yl ld xl, a pop a
  
  vs the mirror method explained previously:
  
  rlwa y ld a, yh rrwa y exg a, yl ld a, xl exg a, yl exg a, xl ld a, yh exg a, xl ld yh, a
  
  Both examples pass the regression test.
  
  Last edit: Visenri 2021-02-16
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-02-21
    
    Right now I have a fully working version of this operands working with build 12014, I am compiling now build 12070, I hope it works without any issues.
    
    My code for SWAP, RRC and RLC has been a hardtest for "genCopy" function.
    After adding more cases to my test suite, I found other permutations making "genCopy" fail.
    
    I have implemented a "simple" permutation algorithm processed before doing reg to reg copy, the order is as follows:
    
    Original code copying reg to stack (modified to check for permutations to not overwrite source operands).
    
    Original Register shuffling.
    
    Original clear registers.
    
    New Check and process permutations.
    
    Original reg to reg copy (with minimal changes)
    
    Rest of copy code untouched.
    .
    
    The algorithm is something like this:
    
    Find permutations.
    
    Choose one of the permutations locations to be backed up by another register not present in any permutation reg.
    
    Check if the backup register needs to be pushed to be used.
    
    Push backup register if needed.
    
    Copy chosen location reg into backup reg.
    
    Execute copy of remaining positions, should work now because the copied position will no longer lock the copy.
    
    Copy backup reg into destination of backup position.
    
    Pop backup register
    
    The new check for permutations is only done if not in a dry run, to save processing time and to normally avoid use of permutation (99% of cases).
    
    For now it only handles one permutation, I mean, just one backup must unlock the copy operation.
    But it could be expanded easily wrapping it all in a loop trying multiple times until no more copy operations take place.
    It only handles the backup using A, X or Y register.
    For now this covers all cases seen so far.
    
    Examples found in my tests and handled correctly by the algorithm:
    
    ; Src : stack2, stack1, yh, yl ; Dst : yh, yl, stack2, stack1 backup done with X (after pushed), stored positions 2,3 (stack2:stack1 -> X). ; Src : xh, xl, yh, a ; Dst : yh, yl, xh, xl backup done with A (after pushed), stored position 1 (yh->a)
    
    If all regression test pass with 12070 I'll upload it here later.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Visenri - 2021-02-22
      
      Here you have the patches, including updated files and diffs.
      I added also a new test for rotations/swap: rotate2.c.
      
      gen.c
      
      Summary of changes:
      Implemented instruction cost
      A_EXG
      A_EXGW
      A_SWAPW
      
      Added asmop ASMOP_YX to be used for swap.
      Added regBytesUsedByAop (may be simplified after seeing other code).
      Added functions isAopAssignableRegPos and AopAssignRegToStackOverWritePos to be used in genCopy.
      Added support for permutations in genCopy.
      Added genSwap, genRotate and genRotateOperation functions (SWAP, RRC and RLC).
      
      There is a lot of debug output, feel free to remove what is too much for you.
      
      Comparison of results for rotate2 test
      build 12070
      
      rotate2 (f: 0, t:1620, c: 108, b: 303577, T: 638263)
      
      after patches:
      
      rotate2 (f: 0, t:1620, c: 108, b: 270660, T: 601200)
      
      Last edit: Visenri 2021-03-03
      
      gen.c
      
      gen.c.diff
      
      main.c
      
      main.c.diff
      
      results-test-stm8-large.log
      
      results-test-stm8.log
      
      rotate2.c
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Philipp Klaus Krause - 2021-03-04
        
        I'll look at the actual functionality when I find time later.
        For now, I've picked the exg / exg handling improvement and the rotate2.c test into the sdcc-next branch (to be merged to trunk after the 4.1.0 release).
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-03-03

If you have looked at the log files, you may have noticed that what I wrote about the results was really the opposite of what you get.

I have corrected the previous post.

Of course with the patches it uses less code and cycles, not more.

rotate2 test (stm8):

revision 12070
rotate2 (f: 0, t:1620, c: 108, b: 303577, T: 638263)

after patches:
rotate2 (f: 0, t:1620, c: 108, b: 270660, T: 601200)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-03-08

I've done minor changes to the patch (only to genSwap function), here you have the updated gen.c compared against the next branch 12085.

gen.c

gen.c.diff

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Implementation of SWAP, RRC and RLC for STM8

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#363 Implementation of SWAP, RRC and RLC for STM8

Related

Discussion