Menu

#363 Implementation of SWAP, RRC and RLC for STM8

None
open
5
2021-06-07
2021-02-03
Visenri
No

A couple of weeks ago I was trying to implement some operators not implemented for STM8.

I came to that after seeing that optimization for SWAP was not handled as stated in the documentation:

Nibble and Byte Swapping
SDCC recognizes the following expressions:
unsigned char i;
unsigned int j;
...
i = ((i << 4) | (i >> 4));

But after seeing the actual SDCC source code I realized that what the help doesn't say is that the compiler recognized those expressions, but it's up to each port to implement the optimized instructions for each special operator.

So I found that neither SWAP, RRC nor RLC were implemented for STM8.

It took some time to dig into the generator code to get an rough idea of how it works.

But I have some doubts.

I have working code for SWAP, RRC and RLC for 1,2 and 4 bytes (passes regression tests, but those do not cover all code flow of cases implemented).

Questions:
Is the variable allocator free to store variables bigger than 2 bytes splitted between X, Y and stack?, for example using combinations like:
XH:YL
XL:YL
XL:STACK

Will it ever call this function with mixes like this?:
Result in Y, left operand in YL:XL
Where the result ovewrites the left operand?

if so, code like this (ROTATE code for 2 bytes size) can fail because genMove may ovewrite the left operand?:

genRotate (const iCode *ic, asmop *result_aop, asmop *left_aop, int size, bool rotateLeft)
  if (size == 1)
    {
    ..... Size 1 code, I can put it here if you want, but I think it is ok, because variable can only be in one place.
    }
  else if (size == 2)
    {
      enum asminst rotateIns = (rotateLeft ? A_SLLW : A_SRLW);
      bool pushed_xy = false;
      bool use_reg_y = false;
      asmop * reg_aop = ASMOP_X;
      short reg_idx = X_IDX;

      if (aopInReg (result_aop, 0, Y_IDX))
          use_reg_y = true;
      else if (!regDead (reg_idx, ic))
        {
          push (reg_aop, 0, 2);
          pushed_xy = true;
        }
      if (use_reg_y)
        {
          reg_aop = ASMOP_Y;
          reg_idx = Y_IDX;
        }

      genMove_o (reg_aop, 0, left_aop, 0, 2, regDead (A_IDX, ic), !use_reg_y, use_reg_y);

      // Rotate
      emit3w (rotateIns, reg_aop, 0);
      enum asminst rotateIns = (rotateLeft ? A_SLLW : A_SRLW);
      bool pushed_xy = false;
      bool use_reg_y = false;
      asmop * reg_aop = ASMOP_X;
      short reg_idx = X_IDX;

      D (emit2 ("; genRotate2 default", ""));

      if (aopInReg (result_aop, 0, Y_IDX))
          use_reg_y = true;
      else if (!regDead (reg_idx, ic))
        {
          push (reg_aop, 0, 2);
          pushed_xy = true;
        }
      if (use_reg_y)
        {
          reg_aop = ASMOP_Y;
          reg_idx = Y_IDX;
        }

      genMove_o (reg_aop, 0, left_aop, 0, 2, regDead (A_IDX, ic), !use_reg_y, use_reg_y);

      // Rotate
      emit3w (rotateIns, reg_aop, 0);

      // Copy carry to the other side
      symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
      if (!regalloc_dry_run)
        emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
      cost (2, 1);

      if (rotateLeft)
        emit3w (A_INCW, reg_aop, 0);
      else
        {
          emit2 ("addw", !use_reg_y ? "x, #0x8000" : "y, #0x8000");
          cost ((reg_idx == X_IDX) ? 3 : 4, 2);
        }

      emitLabel (tlbl);

      genMove_o (result_aop, 0, reg_aop, 0, 2, regDead (A_IDX, ic), !use_reg_y, use_reg_y);

      if (pushed_xy)
        {
          pop (reg_aop, 0, 2);
        }
    }
    ......

The code by default will use X register, but if result will be in Y, it uses it.
If result is not in Y, it checks if X is dead, if not, it pushes it.
Next, the left operand is moved into the register used (no code is generated if both are the same).
After that the processing is done with the register, shift left and copy carry bit to the other side.
Finally a result is moved to the result operand (no code is generated if both are the same).

Does it look correct to you?
Can it fail if result is Y and left operand is in YL:XL (I don't think this condition will ever exist, because it doesn't make any sense).

Another question for testing: Is there any easy way to convince the allocator to store the left operand in a specific register (so I can write test cases to test all code paths used in this function).

Related

Patches: #362

Discussion

1 2 > >> (Page 1 of 2)
  • Visenri

    Visenri - 2021-02-04

    Cases like these make much more sense:

    Result in STACK
    Left operand in XL:STACK
    or
    Result in STACK
    Left operand in XL:YL

    Do these cases happen? I think they should work with the code as it is.

    But this will fail:
    Result in STACK:XH
    Left operand in XL:YL

    Because XL will be considered not dead, X will be pushed.
    X will be loaded with the left operand (but x_dead_global param will be true, possibly failing).
    After that, the rotation will take place in X register.
    And then X will be poped, discarting the result in X register!.

     

    Last edit: Visenri 2021-02-04
  • Visenri

    Visenri - 2021-02-11

    Hello again.
    I cleaned up the code, but my questions still remain unanswered:

    1- Is the variable allocator free to store variables bigger than 2 bytes splitted in any possible combination between X, Y and stack?
    For example using combinations like:
    XH:YL
    XL:YL
    XL:STACK
    If this is the case, more checks should to be done (at least to trigger an assertion).

    2- Is there any easy way to convince the allocator to store the left operand in a specific register?
    (so I can write test cases to test all code paths used in this function).

    And now I have two more:

    3-Is it redundant to do checks like these?:
    !aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic)
    I mean, when result is the same operand as reg_idx, will regDead return always "true"?

    4-Is my code so ugly that no one thinks it deserves any answers :D?

    I have working code for SWAP, RRC and RLC for 1,2 and 4 bytes (passes regression tests, but those do not cover all code flow of cases implemented).
    This is just a snippet of the code used for Rotation of 2 bytes.

    genRotate (const iCode *ic, asmop *result_aop, asmop *left_aop, int size, bool rotate_left)
    {
      bool pushed_a = false, pushed_x = false, pushed_y  = false;
    
      D (emit2 (rotate_left ? "; genRLC" : "; genRRC", ""));
    
      switch (size)
        {
        case 1:
          ... Size 1 code, I think it is ok, because variable can only be in one place.
          break;
        case 2:
          {
            bool use_reg_y = false;
            asmop * reg_aop = ASMOP_X;
            short reg_idx = X_IDX;
    
            D (emit2 ("; genRotate2 default", ""));
    
            if (aopInReg (result_aop, 0, Y_IDX))
                use_reg_y = true;
    
            if (use_reg_y)
              {
                reg_aop = ASMOP_Y;
                reg_idx = Y_IDX;
              }
    
            if (!aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic))
              {
                push (reg_aop, 0, 2);
                if (reg_aop == ASMOP_X)
                  pushed_x = true;
                else
                  pushed_y = true;
              }
    
            genMove (reg_aop, left_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) || !use_reg_y, regDead (Y_IDX, ic) || use_reg_y);
    
            // Rotate
            emit3w (rotate_left ? A_SLLW : A_SRLW, reg_aop, 0);
    
            // Copy carry to the other side
            symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
            if (!regalloc_dry_run)
              emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
            cost (2, 1);
    
            if (rotate_left)
              emit3w (A_INCW, reg_aop, 0);
            else
              {
                emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop, 0));
                cost ((reg_idx == X_IDX) ? 3 : 4, 2);
              }
    
            emitLabel (tlbl);
    
            genMove (result_aop, reg_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) && use_reg_y, regDead (Y_IDX, ic) && !use_reg_y);
            break;
          }
          ...  case 4 code
    
      if (pushed_y)
        pop (ASMOP_Y, 0, 2);
      if (pushed_x)
        pop (ASMOP_X, 0, 2);
      if (pushed_a)
        pop (ASMOP_A, 0, 1);
    }
    
     
    • Philipp Klaus Krause

      1. The stm8 register allocator is free to do nearly anything. xh:yl, xl:stack, etc are surely possible.
        However, you could disallow it from codegen for your cases. See lines 2204-2205 in gen.c for an example.

      2. Not really. Doing some operations that are particularly fast via a specific register on the variable often works though.

      3. A register that is overwritten by an iCode is considered dead at that iCode. The other reason for a register being dead is that nothing might be stored there.

      4. I don't know, I haven't read the code yet. I was very busy and it didn't seem urgent (the urgent thing right now is the 4.1.0 release, which requires testing, finding and fixing bugs).

       

      Last edit: Philipp Klaus Krause 2021-02-12
  • Visenri

    Visenri - 2021-02-12

    Thank you very much for the answers.

    1- Pretty much confirms what I suspected. Just fake a super high cost in these cases hoping this will make the allocator think it's a bad idea, then add an assertion just in case it happens.
    2- See below.
    3- That was exactly my idea: So, I can remove some unneeded checks.
    4- Excuse-me, It's just me being too impatient, sometimes I think I am talking alone here :D.

    2- I can get easily the result in a register, but I haven't seen any way to get source operand in a register, at least for the operations tested (SWAP, RRC and RLC).
    For example:
    "source in stack" - "result in register"

    unsigned char rotate_left_ref_char(unsigned char value)
    {
        return ((value << 1) | (value >> 7));
    }
    

    gives:

    _rotate_left_ref_char:
    ;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
    ; genRLC
    ; genRotate1 default
        ld  a, (0x03, sp)
        sll a
        adc a, #0x00
    ; genReturn
    ; genLabel
    00101$:
    ;   ../src/test4.c: 833: }
    ; genEndFunction
        ret
    

    I was hoping this one to be a "source in reg" to "result in reg", but nope, intermediate result is saved first in the stack, so we get exactly the same case here (from the genRLC point of view):

    unsigned char rotate_left_ref_char(unsigned char value)
    {
        value+=2;
        return ((value << 1) | (value >> 7));
    }
    

    gives:

    _rotate_left_ref_char:
    ;   ../src/test4.c: 831: value+=2;
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        add a, #0x02
        ld  (0x03, sp), a
    ;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
    ; genRLC
    ; genRotate1 default
        ld  a, (0x03, sp)
        sll a
        adc a, #0x00
    ; genReturn
    ; genLabel
    00101$:
    ;   ../src/test4.c: 833: }
    ; genEndFunction
        ret
    
     

    Last edit: Visenri 2021-02-13
  • Visenri

    Visenri - 2021-02-13

    I think it is the way SDCC generates iCode that prevents it from allocating intermediate results in a register (at least sometimes):

    For example:

    Note: peephole optimizer disabled to better see the code output.

    unsigned char test_combined_op(unsigned char value)
    {
        value+=2;
        return value*2;
    }
    
    _test_combined_op:
    ;   ../src/test4.c: 837: value+=2;
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        add a, #0x02
        ld  (0x03, sp), a
    ;   ../src/test4.c: 838: return value*2;
    ; genAssign
        ld  a, (0x03, sp)
    ; genLeftShiftLiteral
        sll a
    ; genReturn
    ; genLabel
    00101$:
    ;   ../src/test4.c: 840: }
    ; genEndFunction
        ret
    

    Two unneeded ld's to and from stack.

    If we combine operations in the same statement, we get optimal result:

    unsigned char test_combined_op(unsigned char value)
    {
        return (value+2)*2;
    }
    
    _test_combined_op:
    ;   ../src/test4.c: 839: return (value+2)*2;
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        add a, #0x02
    ; genCast
    ; genAssign
    ; genLeftShiftLiteral
        sll a
    ; genReturn
    ; genLabel
    00101$:
    ;   ../src/test4.c: 840: }
    ; genEndFunction
        ret
    

    Cosmic compiler has more or less the same behavior.

    But IAR is able to combine operations in different statements:

    unsigned char test_combined_op(unsigned char value)
    {
        PC_ODR = value + 5; // Added to force value out of 'a' register (IAR calling convention has 'a' as byte argument)
        value+=4;
        return value*2;
    }
    
         44          unsigned char test_combined_op(unsigned char value)
         45          {
       \                     test_combined_op:
       \   000000 B7 ..        LD        S:?b0, A
         46              PC_ODR = value + 5;
       \   000002 AB 05        ADD       A, #0x5
       \   000004 C7 500A      LD        L:0x500a, A
         47              value+=4;
         48              return value*2;
       \   000007 B6 ..        LD        A, S:?b0
       \   000009 AB 04        ADD       A, #0x4
       \   00000B 48           SLL       A
       \   00000C 81           RET
       \   00000D              REQUIRE _A_PC_ODR
         49          }
    

    Using the same code for SDCC (even with peephole-opt enabled) results in:

    _test_combined_op2:
    ;   ../src/test4.c: 844: PC_ODR = value + 5;
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        ld  xl, a
    ; peephole 4 removed redundant load from xl into a.
        add a, #0x05
        ld  _PC_ODR+0, a
    ;   ../src/test4.c: 845: value+=4;
    ; genPlus
        ld  a, xl
        add a, #0x04
    ;   ../src/test4.c: 846: return value*2;
    ; genAssign
        ld  (0x03, sp), a
    ; peephole 4a removed redundant load from (0x03, sp) into a.
    ; genLeftShiftLiteral
        sll a
    ; genReturn
    ; genLabel
    ; peephole j30 removed unused label 00101$.
    ;   ../src/test4.c: 847: }
    ; genEndFunction
        ret
    

    Unneeded ld's are still generated. One is removed by the peephole optimizer.

     
    • Philipp Klaus Krause

      For your first test_combined_op, the register allocator should still allocate the intermediate result into a rather than on the stack. I'll try to find out why that doesn't happen sometime next week.

       
      • Philipp Klaus Krause

        Looks like there is an issue with arguments. The register allocator never sees value, so it always stays on the stack where it was passed.
        However, if you first copy value to some other local variable, everything looks fine:

        unsigned char test_combined_op(unsigned char value)
        {
            unsigned char value2 = value;
            value2+=2;
            return value2*2;
        }
        
         
  • Visenri

    Visenri - 2021-02-13

    The case for SWAP and rotate the problem is a little worse, because the operation is not recognized once you add the previous operation in the same statement:

    As shown before, this is fully optimized:

    unsigned char rotate_left_ref_char(unsigned char value)
    {
        return ((value << 1) | (value >> 7));
    }
    
    _rotate_left_ref_char:
    ;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
    ; genRLC
    ; genRotate1 default
        ld  a, (0x03, sp)
        sll a
        adc a, #0x00
    ; genReturn
    ; genLabel
    00101$:
    ;   ../src/test4.c: 833: }
    ; genEndFunction
        ret
    

    This is almost optimal:

    unsigned char rotate_left_ref_char(unsigned char value)
    {
        value+=2;
        return ((value << 1) | (value >> 7));
    }
    
    _rotate_left_ref_char:
    ;   ../src/test4.c: 831: value+=2;
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        add a, #0x02
    ;   ../src/test4.c: 832: return ((value << 1) | (value >> 7));
    ; genRLC
    ; genRotate1 default
        ld  (0x03, sp), a
    ; peephole 4 removed redundant load from (0x03, sp) into a.
        sll a
        adc a, #0x00
    ; genReturn
    ; genLabel
    ; peephole j30 removed unused label 00101$.
    ;   ../src/test4.c: 833: }
    ; genEndFunction
        ret
    

    but this is not recognized as rotation:

    unsigned char rotate_left_ref_char(unsigned char value)
    {
        return (((value+1) << 1) | ((value+1) >> 7));
    }
    
    rotate_left_ref_char:
    ;   ../src/test4.c: 832: return (((value+1) << 1) | ((value+1) >> 7));
    ; genCast
    ; genAssign
        ld  a, (0x03, sp)
    ; genPlus
        inc a
    ; genLeftShiftLiteral
        sll a
    ; genCast
    ; genAssign
        clrw    x
        exg a, xl
        ld  a, (0x03, sp)
        exg a, xl
    ; genPlus
        incw    x
    ; genRightShiftLiteral
        push    a
        clr a
        tnzw    x
        jrpl    00103$
        dec a
    00103$:
        rrwa    x
        sll a
        rlcw    x
        pop a
    ; genCast
    ; genAssign
    ; genCast
    ; genAssign
    ; genOr
        pushw   x
        or  a, (2, sp)
        popw    x
    

    I am not saying IAR is better here, because with my code for rotation and swap, SDCC can beat IAR, because IAR recognizes neither "rotation" nor "swap".

    I am just talking about storage of intermediate results and its relation with combination of statements.

     
    • Philipp Klaus Krause

      The interesting aspect in your last example is that one value + 1 is handled as an 8-bit type, while the other is handled as a 16-bit type.
      This difference already happens in the AST, i.e. early in the frontend. Possibly some leftover optimization (IMO it would make more sense to do such optimizations later, on the iCode).

      If hte return type is changed to unsigned int, both are considered to be the same, and value + 1 is calculated only once.

       
      • Visenri

        Visenri - 2021-02-13

        Casting the result of the +1 operation to unsigned char forces the +1 to be done only once.
        But swap is still not recognized (no matter how many extra casts I add):

        unsigned char swap_ref_char(unsigned char value)
        {
            unsigned char value2 = value;
        
            return (((unsigned char)(value2+1) << 4) | ((unsigned char)(value2+1) >> 4));
        }
        
        swap_ref_char:
            push    a
        ;   ../src/test4.c: 806: unsigned char value2 = value;
        ; genAssign
            ld  a, (0x04, sp)
        ;   ../src/test4.c: 808: return (((unsigned char)(value2+1) << 4) | ((unsigned char)(value2+1) >> 4));
        ; genCast
        ; genAssign
        ; genPlus
            inc a
        ; genCast
        ; genAssign
            ld  xl, a
        ; genCast
        ; genAssign
            ld  a, xl
        ; genLeftShiftLiteral
            swap    a
            and a, #0xf0
            ld  (0x01, sp), a
        ; genRightShiftLiteral
            ld  a, xl
            swap    a
            and a, #0x0f
        ; genOr
            or  a, (0x01, sp)
        ; genReturn
        ; genLabel
        00101$:
        ;   ../src/test4.c: 817: }
        ; genEndFunction
            addw    sp, #1
            ret
        

        But with "unsigned int" version all works as expected, one increment and swap operation (and it also works with or without intermediate variable):

        unsigned int swap_ref_int(unsigned int value)
        {
            unsigned int value2 = value;
            return (((value2+1) << 8) | ((value2+1) >> 8));
        }
        
        _swap_ref_int:
        ;   ../src/test4.c: 876: unsigned int value2 = value;
        ; genAssign
            ldw x, (0x03, sp)
        ;   ../src/test4.c: 877: return (((value2+1) << 8) | ((value2+1) >> 8));
        ; genPlus
            incw    x
        ; genSwap
        ; genSwap2 v1
            swapw   x
        ; genReturn
        ; genLabel
        00101$:
        ;   ../src/test4.c: 889: }
        ; genEndFunction
            ret
        
         
        • Philipp Klaus Krause

          Could you post the current version of your patch here sometime?

           
  • Visenri

    Visenri - 2021-02-13

    I want to remove all the unneeded checks and add the code to disallow the values in split registers.
    I'll retest all and then I'll post it here if you want, but I have also to update my current version to the newest one, so my diffs will but up to date.
    I hope today I'll be able to upload something.

     
    • Visenri

      Visenri - 2021-02-13

      If you want to have a look, this is what I have right now:

      /*-----------------------------------------------------------------*/
      /* genSwap - generates code to swap nibbles, bytes or words        */
      /*-----------------------------------------------------------------*/
      static void
      genSwap (const iCode *ic, asmop *result_aop, asmop *left_aop, int size)
      {
        bool done = false, pushed_a = false, pushed_x = false, pushed_y  = false;
      
        D (emit2 ("; genSwap", ""));
      
        switch (size)
          {
          case 1:
            if (aopSame (result_aop, 0, left_aop, 0, 1) &&
                (aopInReg (result_aop, 0, A_IDX) || aopOnStack (result_aop, 0, 1) || !(aopRS (result_aop))))
              {
                // SWAP with 'a' or any other non reg operand can be done directly with the operand
                D (emit2 ("; genSwap1 v1", ""));
                emit3 (A_SWAP, result_aop, 0);
              }
            else
              {
                D (emit2 ("; genSwap1 default", ""));
                if (!aopInReg (result_aop, 0, A_IDX) && !regDead (A_IDX, ic))
                  {
                    push (ASMOP_A, 0, 1);
                    pushed_a = true;
                  }
      
                cheapMove (ASMOP_A, 0, left_aop, 0, false);
                emit3 (A_SWAP, ASMOP_A, 0);
                cheapMove (result_aop, 0, ASMOP_A, 0, false);
              }
            break;
          case 2:     
            if (aopSame (result_aop, 0, left_aop, 0, 2))
              {
                if (aopInReg (result_aop, 0, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
                  {
                    D (emit2 ("; genSwap2 v1", ""));
                    emit3w (A_SWAPW, result_aop, 0);
                    done = true;
                  }
                else if (regDead (A_IDX, ic) && result_aop->type == AOP_DIR )
                  {
                      // if 'a' is available and the operand is direct, do the swap using exg
                      D (emit2 ("; genSwap2 v2", ""));
                      cheapMove (ASMOP_A, 0, result_aop, 0, false);
                      emit3_o (A_EXG, ASMOP_A, 0, result_aop, 1);
                      cheapMove (result_aop, 0, ASMOP_A, 0, false);
                      done = true;
                  }
              }
      
            if(!done)
              {
                D (emit2 ("; genSwap2 default", ""));
                if (!aopInReg (result_aop, 0, X_IDX) && !regDead (X_IDX, ic))
                  {
                    push (ASMOP_X, 0, 2);
                    pushed_x = true;
                  }
      
                genMove (ASMOP_X, left_aop, regDead (A_IDX, ic), true, regDead (Y_IDX, ic));
                emit3w (A_SWAPW, ASMOP_X, 0);
                genMove (result_aop, ASMOP_X, regDead (A_IDX, ic), false, regDead (Y_IDX, ic));
              }
            break;
          case 4:
            if (aopSame (result_aop, 0, left_aop, 0, 4))
              {
                if (aopInReg (result_aop, 0, X_IDX) && aopInReg (result_aop, 2, Y_IDX) ||
                    aopInReg (result_aop, 0, Y_IDX) && aopInReg (result_aop, 2, X_IDX))
                  {
                    D (emit2 ("; genSwap4 v1", ""));
                    emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
                    done = true;
                  }
              }
            else
              {
                if ((aopInReg (result_aop, 0, X_IDX) && aopInReg (result_aop, 2, Y_IDX) &&
                     aopInReg (left_aop,   0, Y_IDX) && aopInReg (left_aop,   2, X_IDX))
                    ||
                    (aopInReg (result_aop, 0, Y_IDX) && aopInReg (result_aop, 2, X_IDX) &&
                     aopInReg (left_aop,   0, X_IDX) && aopInReg (left_aop,   2, Y_IDX)))
                  {
                    D (emit2 ("; genSwap4 v2", ""));
                    // No need to do anything, the result is already a swapped version of the input
                    done = true;
                  }
              }
      
            if (!done)
              {
                asmop * reg_aop0 = ASMOP_X;
                asmop * reg_aop1 = ASMOP_Y;
                short reg_idx0 = X_IDX;
                short reg_idx1 = Y_IDX;
                short reg_idx_left_0, reg_idx_left_2, reg_idx_result_0, reg_idx_result_2;
                bool use_exg = false;
      
      
                if (aopInReg (result_aop, 2, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
                  {
                      D (emit2 ("; genSwap4 reversed", ""));
                      reg_aop0 = ASMOP_Y;
                      reg_aop1 = ASMOP_X;
                      reg_idx0 = Y_IDX;
                      reg_idx1 = X_IDX;
                  }
                else
                  D (emit2 ("; genSwap4 default", ""));
      
                #define AOP_GET_REG_IDX(aop, offset) aopInReg (aop, offset, X_IDX) ? X_IDX : (aopInReg (aop, offset, Y_IDX) ?  Y_IDX : -1)
      
                reg_idx_left_0   = AOP_GET_REG_IDX(left_aop,   0);
                reg_idx_left_2   = AOP_GET_REG_IDX(left_aop,   2);
                reg_idx_result_0 = AOP_GET_REG_IDX(result_aop, 0);
                reg_idx_result_2 = AOP_GET_REG_IDX(result_aop, 2);
      
                #undef AOP_GET_REG_IDX
      
                if ((reg_idx_left_0 != -1 && reg_idx_left_0 == reg_idx_result_0) ||
                    (reg_idx_left_2 != -1 && reg_idx_left_2 == reg_idx_result_2))
                  use_exg = true; // if any register from left are the same for the result use copy to reg and swap
      
      
                if(!use_exg)
                  {
                    bool copy_offset_0 = !aopSame (result_aop, 2, left_aop, 2, 2); // copy offset 0 first if it will not ovewrite left operand
      
                    D (emit2 ("; genSwap4 copy", ""));
      
                    for (int i = 0; i < 2; i++)
                      {
                        short reg_idx = copy_offset_0 ? reg_idx_result_0 : reg_idx_result_2;
                        bool reg_dead_x = regDead (X_IDX, ic);
                        bool reg_dead_y = regDead (Y_IDX, ic);
      
                        reg_dead_x = reg_dead_x && reg_idx_left_0 != X_IDX && reg_idx_left_2 != X_IDX && (reg_idx_result_0 != X_IDX && reg_idx_result_2 != X_IDX || i == 0);
                        reg_dead_y = reg_dead_y && reg_idx_left_0 != Y_IDX && reg_idx_left_2 != Y_IDX && (reg_idx_result_0 != Y_IDX && reg_idx_result_2 != Y_IDX || i == 0);
      
                        reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                        reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);
      
                        if(copy_offset_0)
                          genMove_o (result_aop, 2, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                        else
                          genMove_o (result_aop, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
      
                        copy_offset_0 = !copy_offset_0; // next time copy the other offset
                      }
                  }
                else
                  {
                    bool copy_offset_0 = !aopInReg (left_aop, 2, reg_idx0); // copy offset 0 first if it will not ovewrite left operand with offset 2
      
                    D (emit2 ("; genSwap4 swap", ""));
      
                    if (reg_idx_result_0 != X_IDX && reg_idx_result_2 != X_IDX && !regDead (X_IDX, ic))
                      {
                        push (ASMOP_X, 0, 2);
                        pushed_x = true;
                      }
                    if (reg_idx_result_0 != Y_IDX && reg_idx_result_2 != Y_IDX && !regDead (Y_IDX, ic))
                      {
                        push (ASMOP_Y, 0, 2);
                        pushed_y = true;
                      }
      
                    for (int i = 0; i < 2; i++)
                      {
                        bool reg_dead_x = i == 0 && reg_idx_left_0 != X_IDX && reg_idx_left_2 != X_IDX;
                        bool reg_dead_y = i == 0 && reg_idx_left_0 != Y_IDX && reg_idx_left_2 != Y_IDX;
                        short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;
      
                        reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                        reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);
      
                        if(copy_offset_0)
                          genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                        else
                          genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    
      
                        copy_offset_0 = !copy_offset_0; // next time copy the other offset
                      }
                    // Swap
                    emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
                  }
      
                genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
                genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
              }
            break;
          default:
            wassertl (false, "unsupported SWAP operand size");
          }
      
        if (pushed_y)
          pop (ASMOP_Y, 0, 2);
        if (pushed_x)
          pop (ASMOP_X, 0, 2);
        if (pushed_a)
          pop (ASMOP_A, 0, 1);
      }
      
       
      • Visenri

        Visenri - 2021-02-13

        This is for rotation:

        /*-----------------------------------------------------------------*/
        /* genRotate - generates code for rotations (RLC & RRC)            */
        /*-----------------------------------------------------------------*/
        static void
        genRotate (const iCode *ic, asmop *result_aop, asmop *left_aop, int size, bool rotate_left)
        {
          bool pushed_a = false, pushed_x = false, pushed_y  = false;
        
          D (emit2 (rotate_left ? "; genRLC" : "; genRRC", ""));
        
          switch (size)
            {
            case 1:
              {
                bool use_result_directly = false;
                asmop *rotate_aop = ASMOP_A;
        
                if (aopSame (result_aop, 0, left_aop, 0, 1) &&
                  (aopInReg (result_aop, 0, A_IDX) || rotate_left && (aopOnStack (result_aop, 0, 1) || !aopRS (result_aop))))
                  {
                    use_result_directly = true;
                    // RLC with 'a' or any other non reg operand can be done directly with the operand
                    // RLC & RRC with "a" can be done directly
                    D (emit2 ("; genRotate1 v1", ""));
                  }
                else
                  D (emit2 ("; genRotate1 default", ""));
        
                symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
        
                if(use_result_directly)
                  rotate_aop = result_aop;
        
                if (!use_result_directly && !aopInReg (result_aop, 0, A_IDX) && !regDead (A_IDX, ic))
                  {
                    push (ASMOP_A, 0, 1);
                    pushed_a = true;
                  }
        
                if (!use_result_directly)
                  cheapMove (ASMOP_A, 0, left_aop, 0, false);
        
                // Rotate
                emit3 (rotate_left ? A_SLL : A_SRL, rotate_aop, 0);
        
                // Copy carry to the other side
                if (rotate_left && aopSame (rotate_aop, 0, ASMOP_A, 0, 1))
                  emit3 (A_ADC, ASMOP_A, ASMOP_ZERO);
                else
                  {
                    if (!regalloc_dry_run)
                      emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
                    cost (2, 1);
        
                    if (rotate_left)
                      emit3 (A_INC, rotate_aop, 0);
                    else
                      {
                        if(!aopInReg (rotate_aop, 0, A_IDX))
                          wassertl (FALSE, "Tried to use 'or' with non 'a' operand");
                        emit2 ("or", "a, #0x80");
                        cost (2, 1);
                      }
                    emitLabel (tlbl);
                  }
        
                if (!use_result_directly)
                  cheapMove (result_aop, 0, ASMOP_A, 0, false);
              break;
              }
            case 2:
              {
                bool use_reg_y = false;
                asmop * reg_aop = ASMOP_X;
                short reg_idx = X_IDX;
        
                D (emit2 ("; genRotate2 default", ""));
        
                if (aopInReg (result_aop, 0, Y_IDX))
                    use_reg_y = true;
        
                if (use_reg_y)
                  {
                    reg_aop = ASMOP_Y;
                    reg_idx = Y_IDX;
                  }
        
                if (!aopInReg (result_aop, 0, reg_idx) && !regDead (reg_idx, ic))
                  {
                    push (reg_aop, 0, 2);
                    if (reg_aop == ASMOP_X)
                      pushed_x = true;
                    else
                      pushed_y = true;
                  }
        
                genMove (reg_aop, left_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) || !use_reg_y, regDead (Y_IDX, ic) || use_reg_y);
        
                // Rotate
                emit3w (rotate_left ? A_SLLW : A_SRLW, reg_aop, 0);
        
                // Copy carry to the other side
                symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
                if (!regalloc_dry_run)
                  emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
                cost (2, 1);
        
                if (rotate_left)
                  emit3w (A_INCW, reg_aop, 0);
                else
                  {
                    emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop, 0));
                    cost ((reg_idx == X_IDX) ? 3 : 4, 2);
                  }
        
                emitLabel (tlbl);
        
                genMove (result_aop, reg_aop, regDead (A_IDX, ic), regDead (X_IDX, ic) && use_reg_y, regDead (Y_IDX, ic) && !use_reg_y);
                break;
              }
            case 4:
              {
                asmop * reg_aop0 = ASMOP_X;
                asmop * reg_aop1 = ASMOP_Y;
                short reg_idx0 = X_IDX;
                short reg_idx1 = Y_IDX;
        
                if (aopInReg (result_aop, 2, X_IDX) || aopInReg (result_aop, 0, Y_IDX))
                  {
                      D (emit2 ("; genRotate4 reversed", ""));
                      reg_aop0 = ASMOP_Y;
                      reg_aop1 = ASMOP_X;
                      reg_idx0 = Y_IDX;
                      reg_idx1 = X_IDX;
                  }
                else
                  D (emit2 ("; genRotate4 default", ""));
        
                if (!aopInReg (result_aop, 0, X_IDX) && !aopInReg (result_aop, 2, X_IDX) && !regDead (X_IDX, ic))
                  {
                    push (ASMOP_X, 0, 2);
                    pushed_x = true;
                  }
                if (!aopInReg (result_aop, 0, Y_IDX) && !aopInReg (result_aop, 2, Y_IDX) && !regDead (Y_IDX, ic))
                  {
                    push (ASMOP_Y, 0, 2);
                    pushed_y = true;
                  }
        
                // Copy data into X:Y registers, use EXGW if data is already in X:Y in reverse order
                if(aopInReg (left_aop, 2, reg_idx0) && aopInReg (left_aop, 0, reg_idx1))
                  emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
                else
                  {
                    bool copy_offset_0 = !aopInReg (left_aop, 2, reg_idx0); // copy offset 0 first if it will not ovewrite left operand with offset 2
                    for (int i = 0; i < 2; i++)
                      {
                        bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                        bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                        short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;
        
                        reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                        reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);
        
                        if(copy_offset_0)
                          genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                        else
                          genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    
        
                        copy_offset_0 = !copy_offset_0; // next time copy the other offset
                      }
                  }
        
                // Rotate
                if (rotate_left)
                  {
                    emit3w (A_SLLW, reg_aop0, 0);
                    emit3w (A_RLCW, reg_aop1, 0);
                  }
                else
                  {
                    emit3w (A_SRLW, reg_aop1, 0);
                    emit3w (A_RRCW, reg_aop0, 0);
                  }
        
                // Copy carry to the other side
                symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
                if (!regalloc_dry_run)
                  emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
                cost (2, 1);
        
                if (rotate_left)
                  emit3w (A_INCW, reg_aop0, 0);
                else
                  {
                    emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop1, 0));
                    cost ((reg_idx1 == X_IDX) ? 3 : 4, 2);
                  }
        
                emitLabel (tlbl);
        
                genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
                genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
              }
              break;
            default:
              wassertl (false, "unsupported RLC/RRC operand size");
          }
        
          if (pushed_y)
            pop (ASMOP_Y, 0, 2);
          if (pushed_x)
            pop (ASMOP_X, 0, 2);
          if (pushed_a)
            pop (ASMOP_A, 0, 1);
        }
        
         
        • Visenri

          Visenri - 2021-02-13

          And this is the calling function for both:

          static void
          genRotateOperation (const iCode *ic)
          {
            int size = getSize (operandType (IC_RESULT(ic)));
          
            operand *result = IC_RESULT (ic);
            operand *left = IC_LEFT (ic);
          
            aopOp (left, ic);
            aopOp (result, ic);
          
            asmop *result_aop = result->aop;
            asmop *left_aop = left->aop;
          
            switch (ic->op)
              {
              case RRC:
              case RLC:
                genRotate (ic, result_aop, left_aop, size, ic->op == RLC);
                break;
          
              case SWAP:
                genSwap (ic, result_aop, left_aop, size);
                break;
              default:
                wassertl (0, "Unimplemented iCode");
              }
          
            freeAsmop (left);
            freeAsmop (result);
          }
          

          Called in genSTM8iCode

              case RRC:
              case RLC:
              case SWAP:
                genRotateOperation (ic);
                break;
          

          And modification to stm8/main.c

          /* Indicate which extended bit operations this port supports */
          static bool
          hasExtBitOp (int op, int size)
          {
            return (op == GETABIT)
                    || (op == RRC  && size <= 4 && size != 3)
                    || (op == RLC  && size <= 4 && size != 3)
                    || (op == SWAP && size <= 4 && size != 3);
          }
          
           
          • Visenri

            Visenri - 2021-02-13

            I also added more instructions, not implemented in
            emit3cost:
            A_EXG
            emit3wcost:
            A_EXGW
            A_SWAPW

            And changed places where this was hardcoded.

             
  • Visenri

    Visenri - 2021-02-14

    After writing some test cases I am getting failures because of operands splitted in registers even when I am using the trick suggested ( cost (1000, 1000) in forbidden cases).

    Rotation with 4 bytes is the one triggering the failures ( assertions have been commented to see the end result):

        case 4:
          {
            asmop * reg_aop0 = ASMOP_X;
            asmop * reg_aop1 = ASMOP_Y;
            short reg_idx0 = X_IDX;
            short reg_idx1 = Y_IDX;
    
            if (regBytesUsedByAop (result_aop, 2, 2, X_IDX) || regBytesUsedByAop (result_aop, 0, 2, Y_IDX))
              {
                  D (emit2 ("; genRotate4 reversed", ""));
                  reg_aop0 = ASMOP_Y;
                  reg_aop1 = ASMOP_X;
                  reg_idx0 = Y_IDX;
                  reg_idx1 = X_IDX;
              }
            else
              D (emit2 ("; genRotate4 default", ""));
    
            if (regUsedPartiallyByAop (left_aop, 0, 4, X_IDX) &&  regUsedPartiallyByAop (result_aop, 0, 4, X_IDX) ||
                regUsedPartiallyByAop (left_aop, 0, 4, Y_IDX) &&  regUsedPartiallyByAop (result_aop, 0, 4, Y_IDX))
              { // This cases are not implemented, so they are forbidden
                cost (1000, 1000);
                //wassert_bt (regalloc_dry_run);
                //return;
                D (emit2 ("; genRotate4 problem", ""));
              }
    
            if (!regDead (X_IDX, ic))
              {
                push (ASMOP_X, 0, 2);
                pushed_x = true;
              }
            if (!regDead (Y_IDX, ic))
              {
                push (ASMOP_Y, 0, 2);
                pushed_y = true;
              }
    
            // Copy data into X:Y registers, use EXGW if data is already in X:Y in reverse order
            if(aopInReg (left_aop, 2, reg_idx0) && aopInReg (left_aop, 0, reg_idx1))
              emit3w (A_EXGW, ASMOP_X, ASMOP_Y);
            else
              {
                bool copy_offset_0 =  !regBytesUsedByAop (left_aop, 2, 2, reg_idx0); //copy offset 0 first if it will not ovewrite left operand with offset 2
    
                if (!copy_offset_0 && !regBytesUsedByAop (left_aop, 0, 2, reg_idx1)) //if reversed order also overwrites we have a problem
                  { // This case is not implemented, so it's forbidden
                    cost (1000, 1000);
                    //wassert_bt (regalloc_dry_run);
                    //return;
                    D (emit2 ("; genRotate4 problem2", ""));
                    D (emit2 (";", "Left : %s, %s, %s, %s", aopGet (left_aop, 3), aopGet (left_aop, 2), aopGet (left_aop, 1), aopGet (left_aop, 0)));
                    D (emit2 (";", "Result : %s, %s, %s, %s", aopGet (result_aop, 3), aopGet (result_aop, 2), aopGet (result_aop, 1), aopGet (result_aop, 0)));
                    D (emit2 (";", "Regs : %s, %s", aopGet2 (reg_aop1, 0), aopGet2 (reg_aop0, 0)));
                  }
    
                for (int i = 0; i < 2; i++)
                  {
                    bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                    bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                    short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;
    
                    reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                    reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);
    
                    if(copy_offset_0)
                      genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                    else
                      genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);    
    
                    copy_offset_0 = !copy_offset_0; // next time copy the other offset
                  }
              }
    
            // Rotate
            if (rotate_left)
              {
                emit3w (A_SLLW, reg_aop0, 0);
                emit3w (A_RLCW, reg_aop1, 0);
              }
            else
              {
                emit3w (A_SRLW, reg_aop1, 0);
                emit3w (A_RRCW, reg_aop0, 0);
              }
    
            // Copy carry to the other side
            symbol *tlbl = (regalloc_dry_run ? 0 : newiTempLabel (NULL));
            if (!regalloc_dry_run)
              emit2 ("jrnc", "!tlabel", labelKey2num (tlbl->key));
            cost (2, 1);
    
            if (rotate_left)
              emit3w (A_INCW, reg_aop0, 0);
            else
              {
                emit2 ("addw", "%s , #0x8000", aopGet2 (reg_aop1, 0));
                cost ((reg_idx1 == X_IDX) ? 3 : 4, 2);
              }
    
            emitLabel (tlbl);
    
            genMove_o (result_aop, 0, reg_aop0, 0, 2, regDead (A_IDX, ic), false, false);
            genMove_o (result_aop, 2, reg_aop1, 0, 2, regDead (A_IDX, ic), false, false);
          }
          break;
    

    I keep getting results like this:
    ; Left : a, xl, xh, (0x04, sp)
    ; Result : yh, a, xh, xl

    _rotate_test_1_xor1:
        sub sp, #4
    ;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 96: value = value ^ rotate_test_value_xor;
    ; genXor
        ld  a, (0x0a, sp)
        xor a, _rotate_test_value_xor+3
        ld  (0x04, sp), a
        ld  a, (0x09, sp)
        xor a, _rotate_test_value_xor+2
        ld  xh, a
        ld  a, (0x08, sp)
        xor a, _rotate_test_value_xor+1
        ld  xl, a
        ld  a, (0x07, sp)
        xor a, _rotate_test_value_xor+0
    ;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 97: return ((value << SHIFT_L) | (value >> SHIFT_R)) AND_OPERATION;
    ; genRRC
    ; genRotate4 default
    ; genRotate4 problem2
    ;   Left : a, xl, xh, (0x04, sp)
    ;   Result : yh, a, xh, xl
    ;   Regs : y, x
        exg a, yl
        ld  a, xl
        exg a, yl
        ld  yh, a
        ld  a, (0x04, sp)
        ld  xl, a
        srlw    y
        rrcw    x
        jrnc    00103$
        addw    y , #0x8000
    00103$:
        ld  a, yl
    ; genAnd
        and a, #0xef
        ld  yl, a
        ld  a, xl
        and a, #0xef
        ld  xl, a
    ; genReturn
    ; genLabel
    ; peephole j30 removed unused label 00101$.
    ;   gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c: 98: }
    ; genEndFunction
        addw    sp, #4
        ret
    

    Attached my "stm8/gen.c" file, the test cases code in "rotate2.c" and the results in "results-test-stm8.log".

     

    Last edit: Visenri 2021-02-14
  • Visenri

    Visenri - 2021-02-14

    I tried changing the code to do the whole copy using only one call to genMove_o, but it results in an assertion failure in genMove_o ("Unimplemented."):

    genMove_o (ASMOP_XY, 0, left_aop, 0, 4, regDead (A_IDX, ic), regDead (X_IDX, ic), regDead (Y_IDX, ic));
    
                if (!copy_offset_0 && !regBytesUsedByAop (left_aop, 0, 2, reg_idx1)) //if reversed order also overwrites we have a problem
                  { // This case is not implemented, so it's forbidden
                    cost (1000, 1000);
                    //wassert_bt (regalloc_dry_run);
                    //return;
                    D (emit2 ("; genRotate4 problem2", ""));
                    D (emit2 (";", "Left : %s, %s, %s, %s", aopGet (left_aop, 3), aopGet (left_aop, 2), aopGet (left_aop, 1), aopGet (left_aop, 0)));
                    D (emit2 (";", "Result : %s, %s, %s, %s", aopGet (result_aop, 3), aopGet (result_aop, 2), aopGet (result_aop, 1), aopGet (result_aop, 0)));
                    D (emit2 (";", "Regs : %s, %s", aopGet2 (reg_aop1, 0), aopGet2 (reg_aop0, 0)));
                  }
                if (reg_idx0 == X_IDX && reg_idx1 == Y_IDX)
                  {
                      genMove_o (ASMOP_XY, 0, left_aop, 0, 4, regDead (A_IDX, ic), regDead (X_IDX, ic), regDead (Y_IDX, ic));
                  }
                else
                  {
                    for (int i = 0; i < 2; i++)
                      {
                        bool reg_dead_x = (i == 0) && regDead (X_IDX, ic); // regDead() only checked the first time
                        bool reg_dead_y = (i == 0) && regDead (Y_IDX, ic);
                        short reg_idx = copy_offset_0 ? reg_idx0 : reg_idx1;
    
                        reg_dead_x = reg_dead_x || (reg_idx == X_IDX);
                        reg_dead_y = reg_dead_y || (reg_idx == Y_IDX);
    
                        if(copy_offset_0)
                          genMove_o (reg_aop0, 0, left_aop, 0, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
                        else
                          genMove_o (reg_aop1, 0, left_aop, 2, 2, regDead (A_IDX, ic), reg_dead_x, reg_dead_y);
    
                        copy_offset_0 = !copy_offset_0; // next time copy the other offset
                      }
                  }
    
    gen/stm8/rotate2/rotate2_size_32_andCase_1_xorLiteral_0_rotateLeft_-1.c:114: error 9: FATAL Compiler Internal Error in file 'gen.c' line number '2138' : Unimplemented.
    

    Corresponding to these lines:

          // No byte can be assigned safely (i.e. the assignment is a permutation).
          if (!regalloc_dry_run)
            wassertl_bt (0, "Unimplemented.");
    
     
  • Visenri

    Visenri - 2021-02-15

    After more testing, this is the problematic case that triggers the assertion in genMove_o ("Unimplemented."):

    The rotate function input asked for these operands:

    ;   Left :    a, xl, xh, yl
    ;   Result : yh,  a, xh, xl
    

    The rotate function calls genMove_o with the closest combination of X and Y (to do the rotation), in this case Y:X:

    ;   Left :    a, xl, xh, yl
    ;   Result : yh, yl, xh, xl
    

    The copy fails because it contains a permutation:
    XH is already in place
    A is copied into YH
    the remaining YL-XL form a permutation, so none can be copied without overwriting the source data for the other.

    As a proof of concept I've modified the code of genMove_o to check if any other reg can be used to mirror the value of YL, so the copy won't end in a dead lock.
    It uses YH and the test compiles and passes the regression test.

    The copy suceeds because it's no longer a permutation:
    XH is already in place
    YL is copied into YH (temp copy of YL, banned from being written until used)
    XL is copied into YL
    YH is copied into XL (YH write is enabled after being used)
    A is copied into YH

    The general concept would be:
    If the copy is not doable, try it again using a free register as temporary storage (one free or used in the result and not involved in the permutation).

     

    Last edit: Visenri 2021-02-16
    • Visenri

      Visenri - 2021-02-16

      For simple cases where A is not involved in the permutation (like this one) even an easier approach would be to use A to perform it.

      I have tested something like this:

      push (ASMOP_A, 0, 1);
      cheapMove (ASMOP_A, 0, source, soffset0, false);
      cheapMove (source, soffset0, source, soffset1, true);
      cheapMove (source, soffset1, ASMOP_A, 0, false);
      pop (ASMOP_A, 0, 1);
      

      where soffset0 and soffset1 represent the offsets of the registers to be exchanged.

      gives:

          ld  yh, a
          push    a
          ld  a, yl
          exg a, yl
          ld  a, xl
          exg a, yl
          ld  xl, a
          pop a
      

      vs the mirror method explained previously:

          rlwa    y
          ld  a, yh
          rrwa    y
          exg a, yl
          ld  a, xl
          exg a, yl
          exg a, xl
          ld  a, yh
          exg a, xl
          ld  yh, a
      

      Both examples pass the regression test.

       

      Last edit: Visenri 2021-02-16
      • Visenri

        Visenri - 2021-02-21

        Right now I have a fully working version of this operands working with build 12014, I am compiling now build 12070, I hope it works without any issues.

        My code for SWAP, RRC and RLC has been a hardtest for "genCopy" function.
        After adding more cases to my test suite, I found other permutations making "genCopy" fail.

        I have implemented a "simple" permutation algorithm processed before doing reg to reg copy, the order is as follows:

        • Original code copying reg to stack (modified to check for permutations to not overwrite source operands).
        • Original Register shuffling.
        • Original clear registers.
        • New Check and process permutations.
        • Original reg to reg copy (with minimal changes)
        • Rest of copy code untouched.
          .

        The algorithm is something like this:

        • Find permutations.
        • Choose one of the permutations locations to be backed up by another register not present in any permutation reg.
        • Check if the backup register needs to be pushed to be used.
        • Push backup register if needed.
        • Copy chosen location reg into backup reg.
        • Execute copy of remaining positions, should work now because the copied position will no longer lock the copy.
        • Copy backup reg into destination of backup position.
        • Pop backup register

        The new check for permutations is only done if not in a dry run, to save processing time and to normally avoid use of permutation (99% of cases).

        For now it only handles one permutation, I mean, just one backup must unlock the copy operation.
        But it could be expanded easily wrapping it all in a loop trying multiple times until no more copy operations take place.
        It only handles the backup using A, X or Y register.
        For now this covers all cases seen so far.

        Examples found in my tests and handled correctly by the algorithm:

        ;   Src : stack2, stack1, yh, yl
        ;   Dst : yh, yl, stack2, stack1
        
        backup done with X (after pushed), stored positions 2,3 (stack2:stack1 -> X).
        
        ;   Src : xh, xl, yh, a
        ;   Dst : yh, yl, xh, xl
        
        backup done with A (after pushed), stored position 1 (yh->a)
        

        If all regression test pass with 12070 I'll upload it here later.

         
        • Visenri

          Visenri - 2021-02-22

          Here you have the patches, including updated files and diffs.
          I added also a new test for rotations/swap: rotate2.c.

          gen.c

          Summary of changes:
          Implemented instruction cost
          A_EXG
          A_EXGW
          A_SWAPW

          Added asmop ASMOP_YX to be used for swap.
          Added regBytesUsedByAop (may be simplified after seeing other code).
          Added functions isAopAssignableRegPos and AopAssignRegToStackOverWritePos to be used in genCopy.
          Added support for permutations in genCopy.
          Added genSwap, genRotate and genRotateOperation functions (SWAP, RRC and RLC).

          There is a lot of debug output, feel free to remove what is too much for you.

          Comparison of results for rotate2 test
          build 12070

          rotate2 (f:  0, t:1620, c: 108, b: 303577, T:   638263)
          

          after patches:

          rotate2 (f:  0, t:1620, c: 108, b: 270660, T:   601200)
          
           

          Last edit: Visenri 2021-03-03
          • Philipp Klaus Krause

            I'll look at the actual functionality when I find time later.
            For now, I've picked the exg / exg handling improvement and the rotate2.c test into the sdcc-next branch (to be merged to trunk after the 4.1.0 release).

             
  • Visenri

    Visenri - 2021-03-03

    If you have looked at the log files, you may have noticed that what I wrote about the results was really the opposite of what you get.

    I have corrected the previous post.

    Of course with the patches it uses less code and cycles, not more.

    rotate2 test (stm8):

    revision 12070
    rotate2 (f: 0, t:1620, c: 108, b: 303577, T: 638263)

    after patches:
    rotate2 (f: 0, t:1620, c: 108, b: 270660, T: 601200)

     
  • Visenri

    Visenri - 2021-03-08

    I've done minor changes to the patch (only to genSwap function), here you have the updated gen.c compared against the next branch 12085.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.