Small Device C Compiler (SDCC) / Patches / #362 Peephole rules pack

Daniel Drotos - 2021-06-10

On Thu, 10 Jun 2021, Philipp Klaus Krause wrote:

However, I have to admit that I don't really know what the number
called "cycles" in section 7 of the STM8 manual means.

Dear Philipp and Visenri,

uCsim is an ISS (Instruction Set Simulator) where cycle accuracy is
not a key point. As you noted, it is hard to correctly implement it
for pipeline CPUs. Cycle count in uCsim is just an aproximation. I'm
afraid it will never match to real hw exactly.

Some time ago I introduced the "so called" virtual ticks. Result of
them is printed on output of "conf" command:

Simulation started, PC=0x008000 Loading from gen/stm8/abs/abs.ihx 1169 words read from gen/stm8/abs/abs.ihx --- Running: gen/stm8/abs/abs Running testAbs --- Summary: 0/6/1: 0 failed of 6 tests in 1 cases. Stop at 0x00831e: (110) Program stopped itself F 0x00831e Simulated 3891 ticks (4.864e-04 sec) Host usage: 0.002176 sec, rate=0.223513 CPU state= OK PC= 0x00831e XTAL= 8e+06 Operation since last reset= (6475 vclks) Inst= 2309 Fetch= 4019 Read= 1436 Write= 1020

"Operation since last reset" here is sum of fetches, mem reads and
writes. In my optinion it measures sw efficiency somehow. And makes
different CPUs comparable.

Daniel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-06-10
  
  I've opened a ticket: [#3250]
  Please, follow any further discussion there.
  I have summarized your comments there.
  
  Related
  
  Bugs: ~~#3250~~
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2021-06-03

Pack 3 is added in [r12418], with some whitespace fixes.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-03

Nice to see most of the rules added.

Regarding rules 600a & 600b:

I checked them and I think it was just a typo and those rules were meant to check 'c' not 'n':

// Unneeded sub-cp replace restart { ld a, %1 %2 a, #%3 } by { ld a, %1 ; peephole 600a removed unneeded %2 v1 } if same(%2 'cp' 'sub'), operandsLiteral(%3), immdInRange(0 0 '+' 0 %3 %4), notUsed('c') replace restart { ld %1, a %2 a, #%3 } by { ld %1, a ; peephole 600b removed unneeded %2 v2 } if same(%2 'cp' 'sub'), operandsLiteral(%3), immdInRange(0 0 '+' 0 %3 %4), notUsed('c')

If 0 is subtracted from any number the result is the same number, so flags 'n' and 'z' will be the same that after the 'ld'.
Only 'c' may be different after removing the 'cp' / 'sub', because it's updated with the carry/borrow of the operation.

Anyway, I'm just shocked that you get some regression failures and I didn't get any. Could you tell me which test(s) failed?
I will run now the tests with the last commits (12419). Later I will add these rules and I'll repeat the tests.

Last edit: Visenri 2021-06-03
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-04

With this changes it also fails.

constantRange.asm fails:

This:

ld a, (0x01, sp) cp a, #0x00 ; peephole j5 changed absolute to relative unconditional jump. jrsle 00112$

Translates into:

; peephole 0a removed dead load into a from 0x01. ; peephole 600a removed unneeded cp v1 ; peephole j5 changed absolute to relative unconditional jump. jrsle 00112$

When it should not.

Maybe I'm not seeing something, but to me, Rule '0a' is being applied when it should not ('jrsle' uses 'n' & 'z').
notUsed('v') may be needed and this is not implemented!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2021-06-04
  
  By changing the macro D_ in peep.c, one can enable debug output. That could tell where a check for n being read goes wrong.
  I guess we'll also want to add support for notUsed('v') sometime, but that's probably beyond the scope of this patch.
  
  P.S.: Could you contact me via email? My address is in sdcc/Changelog.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-06-04
    
    I am well aware of the D_ macro and other debugging techniques.
    
    I needed to reduce the test to the minimum code triggering the problem and later start a debug session, yesterday it was too late for this even for me (I'm a night owl).
    
    I just wanted you or anyone else to know about the wrong behavior, Just in case anyone wanted to have a look.
    
    About the email. Of course, I'll send you an email right now.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Visenri - 2021-06-04
      
      After testing it with a minimal example, I think I found the problem.
      
      There is a typo in stm8/peep.c in function "stm8MightReadFlag":
      
      "jrsgte"
      
      should be
      
      "jrsgt"
      
      The other similar instruction is "jrsge" and is checked correctly.
      
      I'll pass all the test before celebrating it too much :D.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Visenri - 2021-06-05
        
        Success!
        All tests pass, and I checked the modified rules are being applied 37 times.
        
        Here you have the patches.
        
        peep.c
        
        peep.c.diff
        
        peeph.def
        
        peeph.def.diff
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2021-06-06
        
        Looks good to me. I suggest you go ahead and apply it.
        You might also want to still look into rule pack 2, and if it should replace some of the current rules.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Visenri - 2021-06-06
        
        Applied in [r12425]
        I took my time to avoid screwing up things in my first commit :D.
        
        I'll check rule pack 2 next week.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Visenri - 2021-06-12
        
        I have applied rule pack 2 and other improvements in [r12449].
        
        Modified code generation (genPointerSet) to allow optimizations without * reordering rules for ld-and (120) from pack 1 (#362).
        
        Implemented some improvements in peep.c (argCont) to avoid false 'x' positives, a tiny step towards #290.
        
        Fixed double spaces in rules 10f & 10h (that were messing notUsed logic).
        
        Replaced rules 18, 19, 20 & 21 by 202x, 203x, 204x 205x, 210x &220x making use of new "operandsLitOrSym".
        
        Expanded rule 131 scope.
        
        Modified rules (140, 141, 150, 160, 161) to work without needing rule 121 from pack 1 (#362).
        
        Added new rules 515 & 516, similar to 512 & 513 when comparison is done with 'dec' instead of 'cp'.
        
        With the changes I've done rules like 120 & 121 are no longer necessary, so now the state of this patch is:
        
        Done, pure reordering rules not applied and no longer needed (120 & 121):
        
        Done.
        
        Done
        
        Not applied because it is pure reordering.
        
        Done
        
        Done
        
        Last edit: Visenri 2021-06-12
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Visenri - 2021-06-12
        
        Pay attention to the list of packs done, I changed it, because pack numbers were wrong.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-11

assigned_to: Visenri

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Regargind rules not applied from pack 4.
There is still room for improvement by applying rules from that pack, you can see it in examples found in several tests.

Here an example from bitfields-bits1_preBits_0_pattern_0_varType_3.c:

;   gen/stm8/bitfields-bits1/bitfields-bits1_preBits_0_pattern_0_varType_3.c: 313: volatileBits.bitToTest ^= 1;
; skipping iCode since result will be rematerialized
; skipping iCode since result will be rematerialized
; skipping iCode since result will be rematerialized
; genAssign
; genPointerGet
    ld  a, (1, sp)
; peephole 10g moved increment to sp into storage instruction.
    srl a
    srl a
    srl a
    and a, #0x01
; genCast
; genAssign
; peephole 3 removed dead clrw of x.
; genXor
    xor a, #0x01
; genCast
; genAssign
; genPointerSet
    ldw x, sp
    incw    x
    sll a
    sll a
    sll a
    and a, #0x08
    push    a
    ld  a, (x)
    and a, #0xf7
    or  a, (1, sp)
    ld  (x), a
    pop a

This "monstruosity" is simplified gracefully with rules from pack 4, here you can see an example from previous tests I did:

;   gen/stm8/bitfields-bits1/bitfields-bits1_preBits_0_pattern_0_varType_3.c: 297: volatileBits.bitToTest ^= 1;
; skipping iCode since result will be rematerialized
; skipping iCode since result will be rematerialized
; skipping iCode since result will be rematerialized
; genAssign
; genPointerGet
; peephole 10g moved increment to sp into storage instruction.
; genCast
; genAssign
; peephole 3 removed dead clrw of x.
; genXor
; peephole 110 replaced 'xor-and' by 'cpl-and'.
; peephole 121 exchanged 'and-xor' order.
; genCast
; genAssign
; genPointerSet
; peephole 402 changed order of 'srl-ldw' to 'ldw-srl'.
; peephole 402 changed order of 'srl-ldw' to 'ldw-srl'.
; peephole 402 changed order of 'srl-ldw' to 'ldw-srl'.
; peephole 402 changed order of 'cpl-ldw' to 'ldw-cpl'.
; peephole 401 changed order of 'and-ldw' to 'ldw-and'.
; peephole 407 changed order of 'srl-incw' to 'incw-srl'.
; peephole 407 changed order of 'srl-incw' to 'incw-srl'.
; peephole 407 changed order of 'srl-incw' to 'incw-srl'.
; peephole 407 changed order of 'cpl-incw' to 'incw-cpl'.
; peephole 406 changed order of 'and-incw' to 'incw-and'.
; peephole 161 replaced 'srl-cpl-and-sll' (0x01) by 'xor-and' (2, 3).
; peephole 160 replaced 'srl-xor-and-sll' (2, 3) by 'xor-and' (4, 7).
; peephole 160 replaced 'srl-xor-and-sll' (4, 7) by 'xor-and' (8, 15).
; peephole 110 replaced 'xor-and' by 'cpl-and'.
; peephole 130 combined 'and-and' (15, 0x08) into 'and' (8).
; peephole 120 exchanged src operands in 'ld-and'.
; peephole 0w removed dead load into x from sp.
; peephole 642 removed dead incw x.
    ld a, (1, sp)
    xor a, #8
    ld  (1, sp), a
; peephole 300c replaced 'cpl-and-and-or' by 'xor' v3, with incw.
;   gen/stm8/bitfields-bits1/bitfields-bits1_preBits_0_pattern_0_varType_3.c: 298: dummy_f();

4 different rules are applied to finally being able to apply other rules.
The main problem are these two instructions:

    ldw x, sp
    incw    x

That get in the way of applying rules 160, 161 and later all the others.

The job of rules 4xx, as clearly illustrated by this example, was to allow ldw and inw present many times in similar situations to go up until other rules can be applied to 'a' instructions.

Getting the same kind of optimization with specific rules for every case would require an insane number of extra rules.

Last edit: Visenri 2021-06-13

Visenri - 2021-06-13

Fixed comment of rule 406, was showing 'addw' because of a copy-paste error.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2021-11-04

What is the current situation? Has everything that makes sense been applied (so we could close the ticket)? Are there parts left, that we might to apply, for which I should do some testing?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-11-05

As I said some months ago, after [r12449]:
...the state of this patch is:

Done, pure reordering rules not applied and no longer needed (120 & 121):

Done.

Done

Not applied because it is pure reordering.

Done

Done

As I said, there is margin for improvement, but I think all that could be applied safely has been applied.

Even thought there are potential problems (endless loops)) with with pure reordering rules (pack 4). I've never seen them in all my testing, because no other rule seems to undo the changes done by rules of pack 4.

If we put pack 4 aside, this ticket should be closed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2021-11-07
  
  Then I suggest to close this ticket. Since you're the owner, that would be done by you. If we want to brink back an equivalent of pack4, we could do so in a new ticket. Maybe we can even achieve the effect without having rules that do not reduce code size.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-11-07
    
    Ok, I'm closing it.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-11-07

status: open --> closed-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Peephole rules pack - savings for bitfields operations

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#362 Peephole rules pack - savings for bitfields operations

Related

Discussion

Related