Menu

#447 pdk peephole rules

None
closed
5
2024-09-10
2022-11-30
No

I have few peephole rules.
Of course they are full of bugs and should be well reviewed :)
At least they are checked in my code.
Please feel free to give feedback.

1 Attachments

Discussion

1 2 3 > >> (Page 1 of 3)
  • Konstantin Kim

    Konstantin Kim - 2022-12-12

    maybe it's better to split the patch into smaller parts?

     
    • Philipp Klaus Krause

      Yes. But before [bugs:#3525] is fixed for pdk, I'll be reluctant to add new peephole rules for pdk.

       
      👍
      1

      Related

      Bugs: #3525


      Last edit: Maarten Brock 2023-12-29
  • Konstantin Kim

    Konstantin Kim - 2023-12-28

    Lets go one by one.
    This one is pyexpander script generated for and/or to setX.io reduction

    $for(i in range(8)) $for(inst in ['t0sn','t1sn'] )
    replace restart {
        $(inst).io  f, %4
        goto    %2
        mov.io  %1, %5
        or      %1, $msk(i)
        goto    %3
    %2:
        mov.io  %1, %5
        and     %1, $imsk(i)
    %3:
        mov.io  %5, %1
    } by {  ; $peep_id() ( $(inst) bit set/clear )
        $(inst).io  f, %4
        goto    %2
        set1.io %5, #$(i)
        goto    %3
    %2:
        set0.io %5, #$(i)
    %3:
    } if notUsed(%1)
    $endfor $endfor
    
     

    Last edit: Konstantin Kim 2023-12-29
    • Philipp Klaus Krause

      If you want, that $for(inst in ['t0sn','t1sn'] ) can be emulated directly in peephole syntax by making the instruction a variable, e.g. %6, and adding a condition like if same(%6 't0sn' 't1sn').

      P.S.: Using the singleSetBit operator for in the immdInRange condition, even the $for(i in range(8)) part could be emulated in peephole syntax (see e.g. src/stm8/peeph.def for similar single-bit.mask peephole rules). So I guess this can all be just one or two peephole optimizer rules?

       
      👍
      1

      Last edit: Philipp Klaus Krause 2023-12-28
      • Konstantin Kim

        Konstantin Kim - 2023-12-29

        Correct. I made a prototype that works and doesn't require recompilation.
        It can also fit nicely into effectively hard-coded C code.

        I was a little confused by the fact that Same() accepts more than one pattern. Perhaps it would be more appropriate to call it Any() or Subset() for the "one or more" case?

        PS: Sorry, I personally find it very unpleasant to use immdInRange side effects. I'm not ready to pay logic and simplicity for compactness.

         

        Last edit: Konstantin Kim 2023-12-29
        • Philipp Klaus Krause

          Those functions got added one at a time, and while I think a few rare renames to make them more consistent happened the names are still not perfect. AFAIR, notSame was there first, then same got added. But any or subset don't looks intuitive to me either.
          Maybe anySame (for same) and notAnySame (for notSame)?

           
          • Benedikt Freisen

            Maybe equalsAny and equalsNone?
            Or isOneOf and isNoneOf?

            In any case, we should keep in mind that users might have their own auxiliary rule files and therefore still accept the old spelling.

             
            👍
            1
            • Konstantin Kim

              Konstantin Kim - 2023-12-29

              This way we can introduce new recommended aliases and trigger a warning about the deprecation of old spelling.
              This is a purely technical problem of how to implement this. It’s not yet clear yet what and why ;)

               

              Last edit: Konstantin Kim 2023-12-29
              • Philipp Klaus Krause

                Don't know if its worth having a deprecation warning: the .def files in SDCC would be updated, and few users have their own .def files, so the error might be good enough.

                 
            • Philipp Klaus Krause

              IMO, isNoneOf doesn't sufficiently convey that notSame tests pairwise.

               
          • Konstantin Kim

            Konstantin Kim - 2023-12-29

            My point to not reinvent the wheels ;)
            I like the classic (for some languages) all()/any() for list of booleans.

             

            Last edit: Konstantin Kim 2023-12-29
        • Philipp Klaus Krause

          Thanks. IMO we still should make two small changes before this goes in:

          1) The comment stating that the rule got applied should go somewhere near the changed lines, not just at the top, and should fit into the current numbering, so e.g. instead of

          replace restart {
              %5      f, %4
              goto    %2
              mov.io  %1, %5
              or      %1, #0x01
              goto    %3
          %2:
              mov.io  %1, %5
              and     %1, #0xfe
          %3:
              mov.io  %5, %1
          } by {  ; peephole #1 ( %6 bit set/clear )
              %5      f, %4
              goto    %2
              set1.io %5, #0
              goto    %3
          %2:
              set0.io %5, #0
          %3:
          } if notUsed(%1) same(%6 't0sn.io' 't1sn.io')
          

          We'd have

          replace restart {
              %5      f, %4
              goto    %2
              mov.io  %1, %5
              or      %1, #0x01
              goto    %3
          %2:
              mov.io  %1, %5
              and     %1, #0xfe
          %3:
              mov.io  %5, %1
          } by {
              %5      f, %4
              goto    %2
              ; peephole 7 used set0/1.io instead of or/and with mov.io
              set1.io %5, #0
              goto    %3
          %2:
              set0.io %5, #0
          %3:
          } if notUsed(%1) same(%6 't0sn.io' 't1sn.io')
          

          2) It looks like the none of the these rules is currently triggered in the regression tests. Can you provide a small C code sample function, where the rules apply?

           
          • Konstantin Kim

            Konstantin Kim - 2023-12-29

            Fine. I'm leaning towards the option of having a comment at the beginning of the matched block for debugging reasons. But if you prefer, it will be adjusted to your proposal.
            All the rules were used and triggered in mine and in many random codes on the net.
            You are right. Rules should be tested and evaluated.

            Files in the first message was updated.

             

            Last edit: Konstantin Kim 2023-12-29
            • Philipp Klaus Krause

              The regression tests have quite good coverage of standard C code, but extensions such as I/O are not covered that well. So it is not surprising that these rules for mov.io are not currently triggered, and we need to add a test.

               
              • Konstantin Kim

                Konstantin Kim - 2023-12-29

                sample code:

                #include <stdint.h>
                __sfr __at(0x10) PORT;
                #define PIN     5
                void foo(void) {
                    static char cnt;
                    PORT = (cnt & (1<<5)) ? (PORT | (1<<PIN)) : (PORT & ~(1<<PIN));
                    cnt++;
                }
                

                it requires another rule for "c(n)eqsn/and -> tXsn" to trigger
                (snippet was edited)

                 

                Last edit: Konstantin Kim 2023-12-29
                • Philipp Klaus Krause

                  Hmm. I refined that test a bit (see attached source)¹, but it still doesn't trigger your rules: I don't see SDCC generate the t0sn/t1sn your rules need.

                  ¹ I was lazy here, when parametrizing the test - I guess I could have used a proper loop instead using m4 instead - we don't have a preprocessor for peephole optimizer rules, but we do have for tests.

                   
                  • Konstantin Kim

                    Konstantin Kim - 2023-12-29

                    you have to check next bunch of rules for "c(n)eqsn/and -> tXsn"

                     
                  • Konstantin Kim

                    Konstantin Kim - 2023-12-29

                    sorry, my bad
                    have to be:

                        PORT = (cnt & (1<<5)) ? (PORT | (1<<PIN)) : (PORT & ~(1<<PIN));
                        cnt++;
                    

                    At same time I noted that 4.4.0 generates a bit different code patterns than before ;(

                     
  • Konstantin Kim

    Konstantin Kim - 2023-12-29

    Another rules for t0sn/t1sn flow untwisting
    Bassicaly these are 4 rules that have been processed into 40.

     

    Last edit: Konstantin Kim 2023-12-29
    • Philipp Klaus Krause

      I don't like rules #37 to #40. I think every peephole rule should either decrease code size, or keep code size and decrease cycle count.
      We don't want to run into a cycle of peephole rules applications that just keep reordering code without converging.

       
      • Konstantin Kim

        Konstantin Kim - 2023-12-29

        You don't like them yet ;)
        Let's put them aside for now. They are helpers for others, and cause a very good avalanche effect.

         
        • Philipp Klaus Krause

          While I get the purpose of bringing asm code into a canonical form to open it up to optimization by further peephole rule applications, I still think that having rules that decrease neither code size nor cycle count will be causing more trouble in the long-term, so I'd prefer to do without them, even if it means requiring more rules in total.
          Many years ago, there were such rules for z80. z80 has a large number of rules due to many developers having improved it over a long time (the z80 port is much older than the pdk ports). AFAIR, z80 peephole rules not triggered in the regression tests have been removed from SDCC. Still there is a lot of rules left, and it is hard to grasp their potential interactions. For z80, it happened that at some time it as possible to trigger an endless cycle of rule applications; it didn't happen in the regression test, but users reported bugs about SDCC hanging.
          The monotonicity of every rule decreasing code size or keeping code size and decreasing cycle count is a relatively elegant way to avoid that problem.

           
          • Konstantin Kim

            Konstantin Kim - 2023-12-29

            Agree. Nobody likes useless cold code that never runs.
            This is not our case.

            I have formed a good set of rules that were obtained iteratively by shooting non-optimal sequences of the real output of the compiler.
            The overlap of logic is quite expected, but not cold rules.
            Still, thanks for the critical point of view.

             
      • Konstantin Kim

        Konstantin Kim - 2023-12-29

        Rules 37-40 was triggers well in 4.2.9
        At the moment I can't reproduce the trigger case in 4.4.0 #14549 for them.
        Let me hunt them in the meantime

         
      • Konstantin Kim

        Konstantin Kim - 2023-12-29

        rules #37 to #40 equivalent seems is already in place!!
        4.4.0 #14549 shows "; peephole j3 removed goto by inverting test condition."

         
1 2 3 > >> (Page 1 of 3)

Log in to post a comment.

MongoDB Logo MongoDB