Menu

#197 STM8. Not very good code generation

None
closed
5
2024-05-26
2024-05-19
No

STM8. Not very good code generation.

Source code:

static void _delay(unsigned char ticks)
{
    while(--ticks);
}

Generated code:

__delay:
00101$:
    dec a
    tnz a
    jrne    00101$
    ret

TNZ instruction is not needed because DEC already sets N flag.
I am surprised that such fairly old compiler generates ineffective code for such fairly old CPU.

Discussion

  • Vladimir Antonenko

    I mean Z flag.

     
  • Philipp Klaus Krause

    In current trunk, I see codegen generate the tnz, but the peephole optimizer optimizes it out, so the final result is:

    __delay:
    ;   test.c: 75: while(--ticks);
    00101$:
        dec a
        jrne    00101$
    ;   test.c: 76: }
        ret
    

    Which version of sdcc did you use?

     
    • Vladimir Antonenko

      Hi,
      I used sdcc-win64 4.4.0 rc3. Options: --opt-code-speed or --opt-code-size

       
    • Vladimir Antonenko

      Another one:
      SDCC : mcs51/z80/z180/r2k/r2ka/r3ka/sm83/tlcs90/ez80_z80/z80n/r800/ds390/pic16/pic14/TININative/ds400/hc08/s08/stm8/pdk13/pdk14/pdk15/mos6502/mos65c02 TD- 4.4.0 #14620 (Linux)
      sdcc -mstm8 --opt-code-speed test01.c

       
      • Philipp Klaus Krause

        Looks fine in current trunk to me, too.
        There were some stm8 peephole optimizer fixes in early March this year. Maybe one of them helped here.

         
  • Philipp Klaus Krause

    • assigned_to: Philipp Klaus Krause
    • Group: -->
     
  • Vladimir Antonenko

    Another interesting case:

    static void memcpy1(unsigned char* dst, unsigned char* src, unsigned char size)
    {
        do { *dst++ = *src++; } while(--size);
    }
    
    static void memcpy2(unsigned char* dst, unsigned char* src, unsigned char size)
    {
        if (!size) return;
        do { *dst++ = *src++; } while(--size);
    }  
    

    the difference of loop code between these two functions.

     
    • Philipp Klaus Krause

      When using stronger optimization (I tried with --max-allocs-per-node 100000), I get this using sdcc from trunk:

      _memcpy1:
          push    a
          ldw y, (0x04, sp)
          ld  a, (0x06, sp)
          ld  (0x01, sp), a
      00101$:
          ld  a, (y)
          incw    y
          ld  (x), a
          incw    x
          dec (0x01, sp)
          jrne    00101$
          ldw x, (2, sp)
          addw    sp, #6
          jp  (x)
      
      _memcpy2:
          push    a
          tnz (0x06, sp)
          jreq    00106$
          ldw y, (0x04, sp)
          ld  a, (0x06, sp)
          ld  (0x01, sp), a
      00103$:
          ld  a, (y)
          incw    y
          ld  (x), a
          incw    x
          dec (0x01, sp)
          jrne    00103$
      00106$:
          ldw x, (2, sp)
          addw    sp, #6
          jp  (x)
      
       
      • Vladimir Antonenko

        OK, thanks.

        ld a, (0x06, sp)
        ld (0x01, sp), a - saving copy of <size> on stack looks unnecessary to me.</size>

         
        • Philipp Klaus Krause

          Yes. I just introduced an optimization for this in [r14867]. It is still very basic (only works for some simple cases and only for stm8), for your first function, I now get:

          _memcpy1:
              ldw y, (0x03, sp)
          00101$:
              ld  a, (y)
              incw    y
              ld  (x), a
              incw    x
              dec (0x05, sp)
              jrne    00101$
              ldw x, (1, sp)
              addw    sp, #5
              jp  (x)
          
           

          Related

          Commit: [r14867]

  • Philipp Klaus Krause

    • status: open --> closed
     
  • Philipp Klaus Krause

    Looks like current trunk can generate good enough code for this, so I'm closing the ticket.

     

Log in to post a comment.

MongoDB Logo MongoDB