Small Device C Compiler (SDCC) / Patches / #466 mcs51: improve peephole optimizations

Oleg Endo - 2023-09-03

related:
https://sourceforge.net/p/sdcc/feature-requests/884/
https://sourceforge.net/p/sdcc/feature-requests/883/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-09-11

updated patch 0001 -- fixed non-working case 'ab'

0001-fix-and-improve-operandsNotRelated-for-mcs51.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-10-18

Any feedback on this would be highly appreciated.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2023-10-19
  
  Hello, Oleg.
  I really want to look at these patches. Maybe I'll find time this or next week, maybe not. I'm currently a bit busy both with non-SDCC aspects of my life, and for SDCC, there are a few quite serious bugs (basically the bottom 7 open ones in the list at https://sourceforge.net/p/sdcc/wiki/SDCC%204.4.0%20Release/), that IMO really need to be fixed before the next release, so I'd put some time into some of them first.
  But if I don't find time this or next week: I've scheduled a few days to work on SDCC end of October / beginning of November, so I'll definitely have time to look into these patches then.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oleg Endo - 2023-10-19
    
    Thanks Philipp. I think the things I'm touching there could be of interest for you, since you wanted to port the mcs51 backend to your new register allocator. Since my original submission of the patches, I've accumulated a couple of follow up patches that build on top. Mainly it's adding more peephole patterns and adjusting some of the existing ones. But I don't want to make the initial change set even bigger than it already is.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

1: While I agree with the goal of patch 1, I didn't like the macro use, so I rewrote it.
2: Applied as-is.
3: Applied as-is.

These 3 are now in [r14393], resulting in a tiny (about 0.003%) code size reduction in the regression tests for mcs51.

Related

Commit: [r14393]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2023-10-28
  
  Ideally, the repository should always be in a good state, passing regression tests. Running those tests can take some time, so making this a single commit did speed things up. I hope it didn't cause too much inconvenience to you.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

4: I disagree with this patch. IMO it is a rather crude approach that in many cases will work worse than the current one: the patch basically just checks if there is a CALL, PCALL or RET around, and if yes assigns all lines with the first CALL/PCALL/RET encountered (if no it falls back to the current approach).

IMO, the current approach further down is more precise. To make it handle calls better, one could try to handle them similarly to labels in "Strategy #3" (which currently tries to match labels only, but I think that could be changed to match calls, too).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

5: This patch makes a lot of effort to handle the read of acc by jz/jnz. Why not just check for jz/jnz in asmLineNodeFromLineNode and then set the A_IDX bit? The mcs51opcodedata mcs51opcodeDataTable table could stay as-is (though I agree that making it const is a good idea anyway).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-10-28

1: While I agree with the goal of patch 1, I didn't like the macro use, so I rewrote it.

Thanks for looking at it!

Since you've squashed the patches into a single commit, it's now difficult for me to see which parts you have modified. It'd be easier if you applied the patches as individual SVN commits. Then I can also update/rewrite the patches in my own local tree with your version.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-10-28

4: I disagree with this patch. IMO it is a rather crude approach that in many cases will work worse than the current one: the patch basically just checks if there is a CALL, PCALL or RET around, and if yes assigns all lines with the first CALL/PCALL/RET encountered (if no it falls back to the current approach).

I just clobbered it together somehow to avoid generating wrong-code. Not sure what you mean by "work worse" ... since the current approach doesn't work at all for call insns.

If you have an alternative for this patch, I can try it out in my setup and compare the resulting generated code.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2023-10-29
  
  Consider these examples:
  
  replace { x y z call c, f } by { x y' z call c, f }
  
  Where x, y and z are part of a comparison, and call c, f is a conditional call depending on the carry flag. The current approach should work fine for this, but yours will assign the ic of the call to all the comparison instructions.
  
  Or this (assume that each line is from a different ic):
  
  replace { call f1 inst call f2 } by { call f1 inst' call f2 }
  
  The current approach would incorrectly assign the ic for inst' yours would assign an incorrect ic for both inst' and call f2.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Philipp Klaus Krause - 2023-10-29
  
  In general, while it is good to have good correlation of asm lines and ic, IMO this information should not be relied on for the correctness of the generated code. The peephole optimizer likely will never be able to preserve this information perfectly. We should still make a reasonable effort to get it correct, as it is useful (for comments in the .asm file, and for debug information).
  IMO, if you want to know something about the called function at a call instruction, the way to go is to look it up in the symbol by the name of the called function. See e.g. the handling of "call" at the end of stm8MightRead in src/stm8/peep.c.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oleg Endo - 2023-10-30
    
    And what about function calls where the label might not not be in the symbol table?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Philipp Klaus Krause - 2023-10-30
      
      Simple solution: err on the safe side: assume that the call instruction might read any register, and do not assume that it writes any.
      That will impact optimizations, but since the symbol not being in the table should be rare (apart of calls via function pointers), it shouldn't be a big problem.
      
      Alternative solution: have a fallback approach with a somewhat more exact estimate, like the stm8 port does:
      
      if (ISINST (pl->line, "call") || ISINST (pl->line, "callr") || ISINST (pl->line, "callf")) { const symbol *f = findSym (SymbolTab, 0, pl->line + (ISINST (pl->line, "call") ? 6 : 7)); if (f) return stm8IsParmInCall(f->type, what); else // Fallback needed for calls through function pointers and for calls to literal addresses. return (stm8MightBeParmInCallFromCurrentFunction(what) || stm8MightBeParmInPCallFromCurrentFunction(what)); }
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Oleg Endo - 2023-10-30
        
        Honestly, I don't understand what's wrong with having a bitvector attached to the call insn that records the registers it uses and sets. Why not record this exact information when the call insn is emitted (e.g. via aopArg & et al or whatever) and avoid having to wade through it over and over again and resort to some "approximation"? What's the benefit?
        
        amendment:
        Sorry I got this mixed up with patch #11, but it's sort of related. Perhaps we should throw patch #4 and patch#11 into the same discussion pot.
        
        For the peephole reassociation issue, perhaps we could scan the original lines and the new replacement lines for call/jump insns and preserve the 'regsRead' (and 'regsWritten'). Those bitvectors can be determined in whatever way, e.g. via aopArg and then just passed on from one peephole transoformation to another. Then we don't need to rely on iCode <-> asm line correctness. However, when peepholes want to convert 'call' into 'jmp' e.g. to form tailcalls the 'jmp' insn must contain the additional information that it's actually used as a call insn.
        
        Perhaps this should wait until you've looked at the mcs51 aopArg migration.
        
        Last edit: Oleg Endo 2023-10-30
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2023-10-30
        
        Conceptually, the registers used in the function, used for passing arguments or return values are attributes of the function or part of the function type. So we need that information in that place anyway (in particular for earlier compiler stages).
        Having it also at the instruction is duplication of information, which might make sense for performance, if this part of SDCC is performance-critical (see below).
        But the main point is that, IMO we cannot accurately trace the ic in the peephole optimizer, we can only try to approximate there. And if that approximation ends up assigning the wrong ic to some line, this should not affect the correctness of the generated code.
        
        On the other hand, the approach used in stm8 is an "approximation", but a very good one. It is exact for the by far most common use case (calling a function by its name). The function will be in the symbol table, otherwise we couldn't have called it by name in the first place. It might get less accurate if multiple functions are called via function pointers from one function (but that isn't that common), or integer literals get cast into function pointers that then get called (a rather exotic use-case - I guess most C programmers will never do that throughout their lives). So accuracy is not really a problem.
        
        The argument of "having to wade through it over and over again" does make sense. If this is performance-critical, it might be worth some duplication of information. In that case, however my suggestion would be: Attach this information to the asm line when first accessed. When the information is first needed we "wade through it" once, cache the result at the asm line. Every time a peephole rule gets applied, we clear the information for all changed lines. This should save by far most of the "wade through it" effort, but should also always be correct, not relying on tracing the ic correctly.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Oleg Endo - 2023-10-30
        
        But the main point is that, IMO we cannot accurately trace the ic in the peephole optimizer, we can only try to approximate there. And if that approximation ends up assigning the wrong ic to some line, this should not affect the correctness of the generated code.
        
        I guess it might be a good thing to ensure the correct synced association of a couple of key entities. For example in GCC we can easily get the basic block within which an RTL insn resides. This is very useful crucial information for low-level optimization passes. And obviously "best effort approximation" of such information is almost useless.
        
        I think eventually we can't avoid duplication of some of the information at some point, as we go from one data structure (iCode) to another (asm lines). And as the data (= generated code) is processed those data might go out of sync. One particular example that already exists right now is iCode::rUses, iCode::rMask, iCode::rSurv and asmLineNodeBase::regsRead, asmLineNodeBase::regsWritten. I think after the peephole optimization those info might be out of sync. So I think data duplication should be kept to a minimum but not ruled out per se.
        
        or integer literals get cast into function pointers that then get called (a rather exotic use-case - I guess most C programmers will never do that throughout their lives). So accuracy is not really a problem.
        
        It's a rare case, but I've had something like this before:
        
        typedef uint8_t (*func_ptr)(void); .... (*(func_ptr)0x1000) ();
        
        Other cases I can imagine are having arrays of function pointers (in .rodata) and then compute the index of the function to be called based on something.
        As a compiler user it'd be puzzling if that prevented some optimizations and resulted in worse code around the call. There's no obvious reason why this code should compile to something worse than a 'regular' function call.
        
        Attach this information to the asm line when first accessed. When the information is first needed we "wade through it" once, cache the result at the asm line.
        
        Yes, this is how it works for the register use/set information right now. If there's a way to do that for calls (e.g. aopArg way) we could probably plug that directly into mcs51/main.c (asmLineNodeFromLineNode).
        
        Every time a peephole rule gets applied, we clear the information for all changed lines. This should save by far most of the "wade through it" effort, but should also always be correct, not relying on tracing the ic correctly.
        
        Yes, either that or copy over the information from call/jmp/ret insn in the peephole to the matching replacement call/jmp/ret insn.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Oleg Endo - 2023-10-28

5: This patch makes a lot of effort to handle the read of acc by jz/jnz. Why not just check for jz/jnz in asmLineNodeFromLineNode and then set the A_IDX bit?

It becomes relevant later on. The patches in the set build on top of each other.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

6: Applied in [r14394].

Related

Commit: [r14394]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

8: Looks ok to me, but I didn't apply it yet, since it is apparently needed for 9 only.

9: The idea is ok, but IMO the patch needs some minor changes: The MCS-51 SFR are a special way of doing I/O (separate I/O space that does not support indirect addressing). Not just some "special register". Some other architectures supported by SDCC have similar concepts in the hardware or the port, others don't.
So I think "isSpecialRegister" reflect that better in the name (a "special register" could be something totally different, especially for other ports). Maybe "isSpecialFunctionRegister" of "isSFR" or such, and that function should emit an error message if port->peep.isSpecialRegister is NULL, rather than just returning false. The function description should also state if there can be false positives or false negatives.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Maarten Brock - 2023-11-16
  
  What is the point of patch 9. ? The peepholes are meant to optimize generated asm for C code. Do we expect users to start renaming PSW, ACC, B & DPTR. ? If so, they are treading in uncharted areas and totally on their own. I see no need to rescue them from themselves.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Oleg Endo - 2023-11-17
    
    What is the point of patch 9. ? The peepholes are meant to optimize generated asm for C code
    
    Yes, it's supposed to optimize things like
    
    mov a,_ACC
    
    ... in a safe way. I.e. if the symbol is known to map to one of the standard SFR addresses, then this can be optimized. Usually in C code we'd use some SFR name like ...
    
    sdcc/device/include/mcs51/8051.h:__sfr __at (0xE0) ACC ;
    
    ... but it could be named anything actually. So it's better to check for known standard address mappings instead.
    
    One thing is inline-asm-C interworking code. I remember you having a strong opinon on that. However, at current time, this is the only reasonable way to pass data between inline asm and C:
    
    uint8_t bleh (... { ... ACC = something; DPL = something1; DPH = something2; __asm ... ( use a, dpl, dph here ) mov a, < return value > __endasm; return _ACC;
    
    Yes, it's dangerous waters there, but after all, those standard SFRs are an inherent part of mcs51 code, I think. So we can and should assume that there are some use cases where those SFRs might be accessed directly from user C code, for whatever reason or purpose. I can imagine a peripheral that uses the standard CPU SFRs for data exchange and some non-standard SFR to actually trigger some other action of the peripheral.
    
    @spth This might actually be a something to keep in mind when making A, B, DPL, DPH, DPTR allocatable with the future new register allocator. Perhaps a safe way would be to not have those user accessible SFRs being live across inline asm blocks. In other words, make all inline asm implicitly clobber A, B, DPL, DPH.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maarten Brock - 2023-11-18
      
      I think I would prefer to not even define these standard SFRs in the headers for C to use. Code generation needs them badly. What if reading something needs ACC or DPTR?
      
      If someone needs inline asm it is much safer IMHO to write the full function in asm. I agree that the example you gave here is valid in SDCC currently. But to add peephole rules which make all compilations take more time just for this use case seems wrong.
      
      I think I only ever used the ACC sfr directly to get the P flag (parity) efficiently.
      
      void uart_transmit(uint8_t c) { ACC = c; TB8 = P; SBUF = c; }
      
      If we would assume all inline asm to clobber these registers we will probably want to add an intrinsic NOP instead of the current macro in compiler.h that uses inline asm. Or we need a way to indicate which registers get clobbered and which are untouched.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2023-10-28

7: I'm not familiar enough with the bits pseudo-register to have an opinion on this patch. Maybe @maartenbrock can have a look?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

mcs51: improve peephole optimizations

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#466 mcs51: improve peephole optimizations

Related

Discussion

Related

Related