Menu

#2434 [z80] some peephole rules lack "restart"

closed-rejected
nobody
None
Z80
5
2016-01-19
2015-11-18
alvin
No

Some of the z80 peephole rules are written using "replace {" instead of "replace restart {"

The first example in:
https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc/src/z80/peeph-z80.def

is peephole rule #67 on line 980.

The impact is sometimes the peephole step terminates prematurely. I've been wondering occasionally why some optimization was not applied in some of tested code and this was the reason!

(the inability to peephole some loads across 'ex de,hl' as in another bug report is still present)

Related

Wiki: z80

Discussion

  • Ben Shi

    Ben Shi - 2015-11-19

    could you please show a .c source example of less optimization caused by a missing restart? It will work by adding a restart to some particular rules?

     

    Last edit: Ben Shi 2015-11-19
  • Ben Shi

    Ben Shi - 2015-11-22

    I do not think every rule needs a restart.

    all rules are in form of
    replace {
    A part
    } with {
    B part
    }

    For the particular example you mentioned, peephole rule #67 on line 980 of peeph-z80.def,

    replace {
        ld  e,#0x%1
        ld  d,#0x%2
    } by {
        ld  de,#0x%2%1
        ; peephole 67 combined constant loads into register pair.
    }
    

    The ld de,#0x%2%1 has no appearence in other rules' A-part, so a restart is unnecessary here.

    But for the rule in line 2008 of peeph-z80.def

    replace restart {
        add hl,de
        pop de
        jp  (hl)
    %1:
        jp  %5
        jp  %6
        jp  %7
    %2:
    } by {
        ; peephole 150-3 removed addition using short jumps in jump-table.
        pop de
        jp  (hl)
    %1:
        jr  %5
        jr  %6
        jr  %7
    %2:
    } if labelJTInRange
    

    the jp might be optimized by another rule jp -> jr, so a restart might be needed.

     

    Last edit: Maarten Brock 2015-11-22
    • alvin

      alvin - 2015-11-22

      I do not think every rule needs a restart.

      all rules are in form of
      replace {
      A part
      } with {
      B part
      }

      For the particular example you mentioned, peephole rule #67 on line 980 of peeph-z80.def,

      replace {
          ld  e,#0x%1
          ld  d,#0x%2
      } by {
          ld  de,#0x%2%1
          ; peephole 67 combined constant loads into register pair.
      }
      

      There is, for example, peephole 116

      replace restart {
          ld  %1,#%2
          ld  %3,%4
          ld  %1,#%2
      } by {
          ld  %1,#%2
          ld  %3,%4
          ; peephole 116 removed load of #%2 into %1 since it's still there.
      } if notVolatile(%1), operandsNotRelated(%3 %1)
      

      de can match %1 and this rule can be used to eliminate a duplicate load into de.

      In this case peephole 116 appears after 67 so the missing restart does not affect this substitution but in general the effect of a missing restart is that rules appearing prior to that rule may not get a chance to be applied. I failed to write down the specific example I saw, sorry, but I will continue to watch out for it if you need to see another concrete example. I have been looking at thousands of lines of compiled assembly lately to expand the rule set further.

      But I think in the first part of the rules at least (ie before the first barrier) omission of 'restart' was a typo so I thought it wouldn't be controversial to add the restarts where they are missing :)

      In other areas of the peephole rules file delimited by barriers there may be reasons not to restart after a substitution but I haven't spent time thinking about that yet. In my own rule set, I have added restarts to all rules before the first barrier.

       
      • Maarten Brock

        Maarten Brock - 2015-11-22

        It is controversial because testing against the peephole rules is pretty compute intensive. If you have all the time in the world and want the very last byte crunched, then restart on very change. If you don't need the very last byte, but are content with just many bytes saved at moderate cost then don't restart.

         
        • alvin

          alvin - 2015-11-22

          Well only 9 out of 145 rules prior to the first barrier are missing 'restart'. This is why it looked like a typo to me.

          On the z80 compile, at least, the actual compilation step is dominating binary generation by quite a large margin. If you go to "--max-allocs-per-node 200000" for example, I've had compiles take 15-30 mins on a 2.2GHz Windows box. Compared to the default (3000) which tends to take a few mins, including the peephole rules step.

          I've added around 400 rules myself and I haven't really noticed a difference in compile time but I'm also using max-allocs at 200000 when coming up with rules and not the default 3000.

           
          • Ben Shi

            Ben Shi - 2015-11-23

            The conculsion are,
            1. Add a restart to some rules can make further optimization
            2. The new restarts do not make sdcc become slower toomuch
            Agree?

            Which rules need a restart in your opinion? Could you please give us a patch?

             
            • Maarten Brock

              Maarten Brock - 2015-11-23

              Hmm, that was not my conclusion.

              1. A restart to every rule has the potential to give more optimizations.
              2. When you give register allocation an enormous amount of time you can make peephole optimization relatively small.

              With the patch I would like to see code that benefits from it.

               
              • alvin

                alvin - 2015-11-24

                When you give register allocation an enormous amount of time you can make peephole optimization relatively small

                sdcc is doing well against other z80 compilers but it's still missing a few tricks. It tends to treat the z80 as mainly 8-bit rather than taking advantage of some of its 16-bit operations. One area where this is really obvious is in loading and storing of 16-bit / 32-bit statics, which is also an important case as using statics in place of local variables is a very common optimization in z80 code.

                I can give you one example. This is code generated with max-allocs at 200000:

                call     _getInput
                ld     d,l
                ld     e,h
                ld     hl,_main_input_1_400
                ld     (hl),d
                inc     hl
                ld     (hl),e
                dec     hl
                ld     a,(hl)
                inc      hl
                ld      h,(hl)
                ld     l,a
                call     _strlwr_fastcall
                

                A few additional peephole rules will change this to:

                call     _getInput
                ld     (_main_input_1_400),hl
                call     _strlwr_fastcall
                

                I don't think the compiler is able to figure that out on its own, at least given a not-excessive amount of allocation time. The same sort of improvements can be found in 16-/32-bit math. The peephole step is important for z80 code generation currently.

                A restart to every rule has the potential to give more optimizations.

                I think the missing restarts would more affect the output of user-peephole rules. The built-in sdcc rules are correcting simpler but fundamental one-off cases but, eg, the set of rules that accomplish the above result happen from cascades of 3 or 4 user peephole rules. The individual rules that make up that cascade insert intermediate instructions with often dead loads that are eliminated by the sdcc rules. With the dead loads out of the way, other rules can be applied. So if a restart is missing, it may be that a cascade of rules are not applied. OTH, with sdcc's rules on their own the worst result may be some dead loads hanging around.

                I'll propose a patch and look for some examples. But honestly, with only 6 rules out of 145 missing the restart I think they are missing more by accident than deliberately.

                 
                • Ben Shi

                  Ben Shi - 2015-12-08

                  When can your patch be ready? I would like to see some bytes or ticks saved by it in the full regression test for z80.

                   
  • Maarten Brock

    Maarten Brock - 2015-11-22

    Which jp do you think might be optimized by another rule here? AFAIK there is no jr (hl) instruction.

     
    • Philipp Klaus Krause

      What might happen though is that there is another jp around the optimized block, that after the optimization has its target in range.

      Philipp

       
  • Ben Shi

    Ben Shi - 2015-11-22

    Sorry, my fault, I am not familiar with z80. But I thought

    line 1939 of peeph-z80.def does needs a restart, since the placed jp %5 might be further simplified to jr %5.

     
    • Maarten Brock

      Maarten Brock - 2015-11-22

      Again not necessary since rule 153 is located after this rule 143. AFAIK restarting is only necessary when you need to back up in the rules or in the assembly.

       
  • Ben Shi

    Ben Shi - 2015-11-23

    alvin,

    We need you give some exact things like,
    1. your patch
    2. the c program gets benefits from your patch
    3. compile time contrast before / after applying your patch

     
  • Ben Shi

    Ben Shi - 2016-01-15

    Hi, alvin,

    Any update of your patch?

     
    • alvin

      alvin - 2016-01-18

      sorry ben, i got a really bad flu in december and then forgot about it :/

      The issues I was seeing may not be due entirely to these restarts. Have a look at this thread:
      https://sourceforge.net/p/sdcc/discussion/1864/thread/c5ee20ab/#059e

      In my own rule set I have the restarts in place but I am still seeing the dead loads hanging around due to issues mentioned in the thread above. The one time I did spot the restart affecting the outcome was also with my own rule set -- the dead load that wasn't removed prevented another code optimization in that rule set so it was obvious.

      I will try to submit a patch when I get home tonight so you can see if it makes a difference in regression tests.

       
  • alvin

    alvin - 2016-01-19

    Ben, here is a patch for sdcc/src/z80/peeph-z80.def

    I did not touch rules after the first barrier where branches are dealt with. It may be restarts there cause problems.

     

    Last edit: alvin 2016-01-19
  • Ben Shi

    Ben Shi - 2016-01-19
    • status: open --> closed-rejected
     
  • Ben Shi

    Ben Shi - 2016-01-19

    I am afraid I found neither bytes nor ticks saved after applying the patch in reg test.

    If you have any more ideas on improving notUsed() of z80, please tell us.

     

Log in to post a comment.