Menu

#883 mcs51: unnecessary stores of intermediate results to global bit variables

None
open
nobody
None
5
2023-10-26
2023-07-23
Oleg Endo
No

Examples have been compiled on 4.1.6 (svn r12557) with:

sdcc -c --std-sdcc2x -mmcs51 --model-large --stack-auto --fomit-frame-pointer

__bit bleh2 (uint8_t a, uint8_t b)
{
  // return in carry bit
  return true;
}

int bleh3 (void)
{
  // get result from bleh2 in carry bit
  return bleh2 (5, 4);
}

compiles to:

_bleh2:
;   bug.c:52: return true;
    setb    c
;   bug.c:53: }
    ret


_bleh3:
;   bug.c:61: return bleh2 (5, 4);
    mov a,#0x04
    push    acc
    mov dpl,#0x05
    lcall   _bleh2
    dec sp
    mov  b0,c       <<< unneeded store to global bit variable b0
    clr a
    rlc a
    mov r6,a
    mov r7,#0x00
    mov dpl,r6
    mov dph,r7
;   bug.c:62: }
    ret
__bit bleh2 (uint8_t a, uint8_t b)
{
// returns value in 'a'
  return ((a + b) >> 8) & 1;
}

compiles to:

_bleh2:
    mov r7,dpl
;   bug.c:50: return (((a + b) >> 8) & 1);
    mov r6,#0x00
    mov r0,sp
    dec r0
    dec r0
    mov ar4,@r0
    mov r5,#0x00
    mov a,r4
    add a,r7
    mov a,r5
    addc    a,r6
    anl a,#0x01
;   assignBit
    add a,#0xff
    clr a
    rlc a
    add a,#0xff
;   bug.c:51: }
    ret
__bit bleh2 (uint8_t a, uint8_t b)
{
  return (((a + b) >> 8) & 1) != 0;
}

compiles to:

_bleh2:
    mov r7,dpl
;   bug.c:50: return (((a + b) >> 8) & 1) != 0;
    mov r6,#0x00
    mov r0,sp
    dec r0
    dec r0
    mov ar4,@r0
    mov r5,#0x00
    mov a,r4
    add a,r7
    mov a,r5
    addc    a,r6
    anl a,#0x01
    cjne    a,#0x01,00103$
00103$:
    cpl c
    mov b0,c     <<< unneeded store to global bit variable b0
;   bug.c:51: }
    ret

It seems that in some cases it's a leftover of a peephole optimizations. If peepholes are disabled we can see something like:

    mov b0,c
;   genNot
    cpl b0
;   genRet
    mov c,b0

  ...

    mov b0,c
;   genCast
    mov c,b0

It'd be also good if it first tried to use the bits of register 'b' first, before starting to use global bit variables as temporaries. But that seems another thing.

Discussion

  • Maarten Brock

    Maarten Brock - 2023-07-24

    Ticket moved from /p/sdcc/bugs/3623/

    Can't be converted:

    • _category: MCS51
     
  • Maarten Brock

    Maarten Brock - 2023-07-24

    b0 is not a global bit variable, it is a virtual bit register. There are 8 virtual bit registers (b0-b7) in bits that can be pushed to the stack. The B register cannot be used unless the compiler would know that B is free. (Even PSW would be a better suggestion.) B is e.g. used for passing 24/32 bit parameters/results, multiplication and most importantly in this case for passing __bit parameters.

    void foo(__bit z, __bit y, __bit x);
    __bit bar(__bit x, __bit y, __bit z)
    {
        foo(z, y, x);
        return x;
    }
    
    ;--------------------------------------------------------
    ; overlayable bit register bank
    ;--------------------------------------------------------
            .area BIT_BANK  (REL,OVR,DATA)
    bits:
            .ds 1
            b0 = bits[0]
            b1 = bits[1]
            b2 = bits[2]
            b3 = bits[3]
            b4 = bits[4]
            b5 = bits[5]
            b6 = bits[6]
            b7 = bits[7]
    [...]
    ;------------------------------------------------------------
    ;Allocation info for local variables in function 'bar'
    ;------------------------------------------------------------
    ;z                         Allocated to registers b2 
    ;y                         Allocated to registers b1 
    ;x                         Allocated to registers b0 
    ;------------------------------------------------------------
    ;       bug3623.c:2: __bit bar(__bit x, __bit y, __bit z)
    ;       -----------------------------------------
    ;        function bar
    ;       -----------------------------------------
    _bar:
    [...]
    ;       bug3623.c:4: foo(z, y, x);
            mov     c,b2
            mov     b[0],c
            mov     c,b1
            mov     b[1],c
            mov     c,b0
            mov     b[2],c
            push    bits
            mov     bits,b
            lcall   _foo
            pop     bits
    ;       bug3623.c:5: return x;
            mov     c,b0
    ;       bug3623.c:6: }
            ret
    

    Suboptimal code generation is not a bug.

     
  • Oleg Endo

    Oleg Endo - 2023-07-24

    b0 is not a global bit variable, it is a virtual bit register. There are 8 virtual bit registers (b0-b7) in bits that can be pushed to the stack.

    Yes, but effectively, those end up being allocated as "global bit variables". I've seen that it's caused by virtual bit register usage.

    The B register cannot be used unless the compiler would know that B is free. (Even PSW would be a better suggestion.) B is e.g. used for passing 24/32 bit parameters/results, multiplication and most importantly in this case for passing __bit parameters.

    Isn't that information available within a function?

    Suboptimal code generation is not a bug.

    Alright. Will file as 'feature request' in the future.

     
  • Maarten Brock

    Maarten Brock - 2023-07-24

    Yes, but effectively, those end up being allocated as "global bit variables". I've seen that it's caused by virtual bit register usage.

    Do you also consider r0-r7 to be allocating global __data memory? I do not and to me bits is no different. Remember that bits is in overlayed memory.

    Isn't that information available within a function?

    Unfortunately, no it isn't. Remember that mcs-51 still uses the old register allocator and is single pass, unlike most other ports. It has no idea what following instructions might need. So it could claim the B register to store a bit and then later fail because B is no longer available for MUL.

     
    • Oleg Endo

      Oleg Endo - 2023-07-24

      Do you also consider r0-r7 to be allocating global __data memory? I do not and to me bits is no different. Remember that bits is in overlayed memory.

      Well ... if you put it that way ... ;)
      My sole point here being, in this case it's triggering the allocation of 1 byte in the bits space (if I'm not mistaken), even though it's a dead store.

      Unfortunately, no it isn't. Remember that mcs-51 still uses the old register allocator and is single pass, unlike most other ports. It has no idea what following instructions might need. So it could claim the B register to store a bit and then later fail because B is no longer available for MUL.

      Thanks for the clarification. That's a severe limitation.

       
      • Maarten Brock

        Maarten Brock - 2023-07-24

        Yes, indeed it triggers the allocation of that one byte. And if this is the only place in your whole program that requires a (temporary) bit register then it is relatively expensive. In that case returning char might be a better idea. But if you have more places where a bit register is needed, it becomes virtually free. Up to 8 bit registers can be used and only after that will it start allocating on stack. And that will be one byte per bit along with very expensive manipulations.

        Notwithstanding, it would be best if this bit register could be optimized out here since it appears to be unused.

        I am more concerned about that assignBit code taking two additions.

         
  • Oleg Endo

    Oleg Endo - 2023-07-26

    @maartenbrock I've been wading through the peephole code of mcs51. It's actually already tracking register read/write for each insn in the function. The code in mcs51/peep.c scan4op is using that.

    However, although the array inreg_info regs8051[] = { in mcs51/ralloc contains 26 entries, mcs51_nRegs is set to only 16 (or sometimes 8). Not sure why that is, but it prevents the dead-store code from checking those registers above index 16. The virtual bit registers are recorded there, too, as well as the B register. So while it's not known ahead whether register B is used or not, it certainly is known after processing the function once. So if there is no use overlap of B and the virtual bit registers, the virtual bit register uses could be converted to use the B register bits (with a separate processing pass).

    Anyway, it seems the dead store to virtual bit register in my original case here could be eliminated with a peephole. It's just that there are no dead-store peepholes for bit registers it seems, and perhaps some other implied pitfalls. I'm trying to add support for dead-store removal via peephole of A register now .... but perhaps it's better to make a separate dead-store optimization pass ....

     
  • Oleg Endo

    Oleg Endo - 2023-07-31

    After extending the peephole code to support the virtual bit registers, I can confirm that simple peephole patterns like

    // dead store of virtual bit registers
    replace restart {
        mov %1,c
    } by {
        ;   Peephole 305    mov %1,c removed
    } if same(%1 'b0' 'b1' 'b2' 'b3' 'b4' 'b5' 'b6' 'b7'), deadMove(%1)
    

    will remove the temporary virtual bit register stores within the function, unless they are otherwise used (e.g. by more complex bit arithmetic). However, after the peephole optimizations, the function will still end up allocating the bits variable. It seems to be the same problem as with sticky push/pop sequences of registers, which were eliminated by peephole patterns.

     
    • Oleg Endo

      Oleg Endo - 2023-10-26
       

      Last edit: Oleg Endo 2023-10-27

Log in to post a comment.