Menu

#2494 [z80] Peephole Rule #25 Incorrect

closed-fixed
Ben Shi
Z80
5
2016-04-21
2016-04-18
alvin
No

z80 peephole rule #25 is incorrect. Currently it looks like this:

replace restart {
    ld  c,l
    ld  b,h
    ld  a,#%1
    ld  (bc),a
} by {
    ld  c,l
    ld  b,h
    ld  (hl),#%1
    ; peephole 25 loaded #%1 into (hl) instead of (bc).
}

The problem is it wipes out the A register when the substitution is made.

It should be qualified by "if notUsed('a')"

In our peephole set I replaced it with three rules:

replace restart {
    ld  c,l
    ld  b,h
    ld  a,#%1
    ld  (bc),a
} by {
    ld  a,#%1
    ld  (hl),a
    ld  c,l
    ld  b,h
    ; peephole 25a loaded a into (hl) instead of (bc).
}

replace restart {
    ld  c,l
    ld  b,h
    ld  (bc),a
} by {
    ld  (hl),a
    ld  c,l
    ld  b,h
    ; peephole 25b loaded a into (hl) instead of (bc)
}

replace restart {
    ld  a,#%1
    ld  (hl),a
} by {
    ld  (hl),#%1
    ; peephole 25c loaded #%1 into (hl) directly
} if notUsed('a')

Two of the rules are trying to delay the copy of HL into BC in the hope that the load into BC will be dead. They interact with other custom rules to increase the chance that the copy into BC will be eliminated. Rule 25c is the spirit of the original rule 25.

Related

Wiki: z80

Discussion

  • Ben Shi

    Ben Shi - 2016-04-18

    Do you have a full patch of your own customized rules?

    I would like to see ticks or bytes saved in a regression test.

     
  • alvin

    alvin - 2016-04-19

    We pluck the sdcc binary out and insert into our own toolchain with front end, libraries, assembler and linker so there is no patch for sdcc per se.

    But the rules we currently use are these:
    http://z88dk.cvs.sourceforge.net/viewvc/z88dk/z88dk/libsrc/_DEVELOPMENT/sdcc_peeph.3

    This ruleset began as something to play around with so there are many examples of how not to do things in there. The problem is it grew and now it's not easy to replace with something saner without a lot of work.

    To use that ruleset we use these options on the compile line:
    --no-peep --peep-file c:\z88dk\Lib\Config\....\/libsrc/_DEVELOPMENT/sdcc_peeph.3

    For best results add:
    --reserve-regs-iy --max-allocs-per-node200000

    Forbidding sdcc from using iy creates more opportunities to improve the code and, overall, with few exceptions code quality is better. The max allocs setting is the setting we use when coming up with the rules. A lower setting might generate slightly different code patterns that are not best served by these rules.

    This is still not enough -- sdcc's z80 peepholer is currently bugged such that it can't accurately determine when registers are actually read or not. It can mistakenly look in comments and label names for register names. So eg, something like "ld a,(double)" in the compiler's output may trick the peepholer into thinking registers d,b,l,e are being read. Bugged code is not generated but the peepholer misses opportunities to apply rules. This is very obvious with our ruleset because there are a lot more rules for missed substitutions to occur.

    This was discussed here:
    https://sourceforge.net/p/sdcc/discussion/1864/thread/c5ee20ab/

    The necessary fix is included in our patch:
    http://z88dk.cvs.sourceforge.net/viewvc/z88dk/z88dk/libsrc/_DEVELOPMENT/sdcc_z88dk_patch.zip

    However, I do plan to revisit that to see if it can be done better and perhaps improve a few more things. The patch contains more than just a peepholer fix -- there is a small amount of code to allow sdcc to generate an explicit externs list for symbols declared extern in header files. This makes the output suitable for our linker (and presumably most linkers). At the moment, sdcc is expecting the linker to look up names that are not declared extern by the program in external libraries which is a bit wrong. However this code breaks a few of the other non-z80 targets.

    Anyway, for translated C only (ie not library code) we see 5-10% improvement in code size compared to native sdcc with associated speedup. This is due mainly to the peephole set but a few other things also contribute to this; in particular we change sdcc's calls to its primitives to callee linkage to eliminate trailing pops and we eliminate the INITIALIZER section in favour of either a compressed version or a one-time initialized data section without initializer.

     
  • Ben Shi

    Ben Shi - 2016-04-19

    I will try the 3 rules proposed and see if any improvements.

    I knew that notUsed and notVolatile often return negative false which cause less efficient z80 code generated. And you are appreciated to compose a patch for sdcc to improve that with both new notUsed and other new rules.

    Actually I had made notUsed of stm8 more accurate in revision 9113 and get 4% of total bytes saved, but I am not familiar with z80.

     
  • Ben Shi

    Ben Shi - 2016-04-20

    Unfortunately after applying the proposed 3 rules, both total bytes and ticks increased.

    Before,
    Summary for 'ucz80': 0 failures, 9538 tests, 1780 test cases, 2564245 bytes, 13638665 ticks

    After,
    Summary for 'ucz80': 0 failures, 9538 tests, 1780 test cases, 2565828 bytes, 13642557 ticks

     
  • Ben Shi

    Ben Shi - 2016-04-20
    • status: open --> closed-rejected
    • assigned_to: Ben Shi
     
  • alvin

    alvin - 2016-04-20

    Ben I think you misunderstand -- the rule is incorrect and leads to generation of bugged code. It cannot be left as is.

     

    Last edit: alvin 2016-04-20
  • Maarten Brock

    Maarten Brock - 2016-04-20
    • status: closed-rejected --> open
     
  • Maarten Brock

    Maarten Brock - 2016-04-20

    Ben, you should first fix the bug by adding the "if notUsed('a')", then run regression tests and only after that see if the 3 new rules make an improvement. Chances are this rule is currently applied while A was in use, but it didn't trigger a failing test.

    Btw. Alvin, it would really help if you can also provide a test case that exhibits this bug, so we can add it to the regression tests.

     

    Last edit: Maarten Brock 2016-04-20
    • alvin

      alvin - 2016-04-20

      Btw. Alvin, it would really help if you can also provide a test case that exhibits this bug, so we can add it to the regression tests.

      Unfortunately the specific code fragment I have doesn't reproduce the bug on sdcc alone -- it is arising after some other combination of optimizations is applied that sdcc alone does not have.

      The C code causing the problem looks innocuous:

      uint16_t *M;
      ...
      case 'G':
      ...
              M[A] = 0;  // here is the problem
      ...
              goto sw;
      

      The generated code contains this before peephole 25 is applied:

      ;test.c:135: M[A] = 0;
          ld  hl,(_A)
          add hl,hl
          ld  bc,(_M)
          add hl,bc
          ld  c,l
          ld  b,h
          ld  a,0x00
          ld  (bc),a
          inc bc
          ld  (bc),a
      

      and after peephole 25 is applied it is incorrect:

      ;test.c:135: M[A] = 0;
          ld  hl,(_A)
          add hl,hl
          ld  bc,(_M)
          add hl,bc
          ld  c,l
          ld  b,h
          ld  (hl),0x00
          inc bc
          ld  (bc),a   // a is not zero anymore
      
       

      Last edit: alvin 2016-04-20
  • Ben Shi

    Ben Shi - 2016-04-20

    I have added the notUsed('a') in revision 9573.

    And the above 'before' result was right of 9573.

    And the 'after' result was of 9572 with alvin's 3 rules.

     

    Last edit: Maarten Brock 2016-04-20
  • Maarten Brock

    Maarten Brock - 2016-04-20

    When I run the regression tests from [r9574], that is with the notUsed('a'), I see 2565525 bytes, 13642165 ticks for test-ucz80. Revision 9573 did not yet have this patch, but you may have had it applied locally.

    This means the difference is much smaller, but the 3 rules still perform worse than just 1.

    With the applied fix, this bug should be marked fixed instead of rejected IMO. But I still would very much like to see a regression test for this issue.

     
    • alvin

      alvin - 2016-04-20

      This means the difference is much smaller, but the 3 rules still perform worse than just 1.

      It's not possible for the three rules to perform worse so this is probably indicative of the aforementioned bug in the sdcc-z80 peepholer.

      However it is sufficient to fix the bug by qualifying the rule with notUsed('a') as done already.

       
  • Ben Shi

    Ben Shi - 2016-04-21
    • status: open --> closed-fixed
     
  • Ben Shi

    Ben Shi - 2016-04-21

    There is a feature request 468
    for improving the precision of notUsed.

     

Log in to post a comment.