Menu

#902 optimized rotation

open
nobody
8051 (8)
5
2024-03-14
2024-03-13
No

An interesting approach was found here: https://lab.whitequark.org/notes/2020-04-06/synthesizing-optimal-8051-code/

An SMT solver (Racket + Rosette) was used to find the optimal instructions sequence.
I believe this approach could potentially be used to generate new peephole rules and verify existing ones.

Discussion

  • Philipp Klaus Krause

    I guess so. Another approach with similar benefits and drawbacks is the generation of peephole optimizer rules via a superoptimizer.

    Personally, I still intend to look into major code generation and register allocation improvements for mcs51 later this year, so I'm not too motivated to look into the peephole optimizer rules (changes in register allocation and code generation have a tendency to make existing peephole optimizer rules obsolete).

    P.S.: Looking at the examples (8- and 16-bit rotates) that the approach was tried on: No surprise for 8-bit, the solver found the "obvious" approaches anyone would have used (and proved them optimal). But some of the 16-bit cases are interesting can be good inspiration for rotation codegen improvements.

     

    Last edit: Philipp Klaus Krause 2024-03-13
    • Konstantin Kim

      Konstantin Kim - 2024-03-14

      I would like to draw attention to a homogeneous approach that utilizes ready-to-use solid tools, potentially allowing decentralization of the work.

       
  • Maarten Brock

    Maarten Brock - 2024-03-14

    Is this just a tip about an approach you found somewhere or do you actually see generated code that can be improved? If the latter, can you give an example?

     
    • Konstantin Kim

      Konstantin Kim - 2024-03-14

      It appears to be both.

      The article result shows better code for a 16-bit circular constant shift and offers a straightforward idea for improving the peephole rules.

      The 8-bit rotations seem to match the actual SDCC outputs. However, the 16-bit ones are just brilliant!

      My favorite is 3 bit shift

      unsigned short RR1(unsigned short v) {
          return ((v) >> 3) | (v << (16-3));
      }
      

      sdcc gives (-mmcs51 --opt-code-size )

          mov r6, dpl
          mov r7, dph
          mov ar4,r6
          mov a,r7
          swap    a
          rl  a
          xch a,r4
          swap    a
          rl  a
          anl a,#0x1f
          xrl a,r4
          xch a,r4
          anl a,#0x1f
          xch a,r4
          xrl a,r4
          xch a,r4
          mov r5,a
          mov a,r6
          swap    a
          rl  a
          anl a,#0xe0
          mov r7,a
          clr a
          orl a,r4
          mov dpl,a
          mov a,r7
          orl a,r5
          mov dph,a
      

      For R1:R0 pair, the article suggests the following code:

      mov a, r0
      xrl a, r1
      mov r1, a
      anl a, #0xf8
      xrl a, r0
      swap a
      rl a
      xch a, r1
      swap a
      rl a
      xrl a, r1
      mov r0, a
      
       

Log in to post a comment.