Small Device C Compiler (SDCC) / Patches / #392 OR - XOR cast optimization

Visenri - 2021-06-18

Even more interesting results can be seen in tests, for example 'bug-1929.c'

No patch applied, cast generates operations to finally throw away the result restoring 'a' from stack:

; gen/stm8/bug-1929/bug-1929.c: 55: reg3 ^= MASK; ; genCast ld a, _reg3+0 push a rlc a clr a sbc a, #0x00 ; peephole 0 removed dead load into xl from a. pop a ; genXor xor a, #0x0f ; genCast ; genAssign ld _reg3+0, a

After patch:

; gen/stm8/bug-1929/bug-1929.c: 55: reg3 ^= MASK; ; genXor ld a, _reg3+0 xor a, #0x0f ld _reg3+0, a

I have not tested other ports, but benefits are also expected
If someone else gives me a thumbs up for this patch, I will do all the remaining tests and commit it if none fails.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Results with more complex expressions are even better (test: rotate2_andCase_0_rotateLeft_0_structVar_1_xorLiteral_1_size_8.asm).

Here, XOR is combined with SWAP:

return (((value2 ^ rotate_test_value_xor) << SHIFT_L) | ((value2 ^ rotate_test_value_xor) >> SHIFT_R)) AND_OPERATION;

And this was the result:

; genAssign
    ld  a, (0x03, sp)
;   gen/stm8/rotate2/rotate2_andCase_0_rotateLeft_0_structVar_1_xorLiteral_1_size_8.c: 142: return (((value2 ^ rotate_test_value_xor) << SHIFT_L) | ((value2 ^ rotate_test_value_xor) >> SHIFT_R)) AND_OPERATION;
; genCast
; genAssign
    clrw    x
; genXor
    xor a, #0x24
; genCast
; genAssign
    ld  xl, a
; peephole 4 removed redundant load from xl into a.
; genLeftShiftLiteral
    swap    a
    and a, #0xf0
; genRightShiftLiteral
    sraw    x
    sraw    x
    sraw    x
    sraw    x
; genCast
; genAssign
; genCast
; genAssign
; genOr
    pushw   x
    or  a, (2, sp)
    popw    x

After the patch, perfect optimization:

; genAssign
    ld  a, (0x03, sp)
;   gen/stm8/rotate2/rotate2_andCase_0_rotateLeft_0_structVar_1_xorLiteral_1_size_8.c: 142: return (((value2 ^ rotate_test_value_xor) << SHIFT_L) | ((value2 ^ rotate_test_value_xor) >> SHIFT_R)) AND_OPERATION;
; genXor
    xor a, #0x24
; genSwap
    swap    a

I was expecting some improvement, but these cases are just shocking me.
More different cases can be seen in rotate2...xorLiteral_1_size_8.

Philipp Klaus Krause - 2021-06-19

The following RFE might be related:
[feature-requests:#531]

Related

Feature Requests: #531

Last edit: Maarten Brock 2021-06-19

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Philipp Klaus Krause - 2021-06-19

SDCC used to have similar optimizations at the AST level; many have been removed since they caused bugs.

Here is an example of code that breaks with your patch:

int i, j; static volatile unsigned char testVar; void testOrXorOptimization(void) { i = sizeof (testVar ^ 8); j = sizeof (testVar | 8); }

According to the C standard, due to integer promotion in the usual arithmetic conversions, testVar ^ 8 and testVar | 8 are of type int, so i and j should be set to 2. With the patch they get set to 1 instead. Using sizeof like that would be unusual, but possible in C90. With newer C standards the problem gets worse, as _Generic, typeof and, if it gets in, auto would be affected, too.

Another disadvantage of doing such an optimization on AST is that it won't work on something like (int)testVar ^ 8. While a programmer wouldn't write that literally and expect optimization, the (int) could e.g. come from argument promotion of an inline function.

The functions in ctype.h would be an example. For those, a few years ago I introduced a similar optimization. It works on the iCode. That tends to be a bit more involved than doing it on the AST, but the advantage is that it applies to even more code and that it doesn't break _Generic, etc.

See optimizeOpWidth in SDCCopt.c for the details. Maybe this or / xor optimization can be done there, too?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-06-19
  
  Another disadvantage of doing such an optimization on AST is that it won't work on something like (int)testVar ^ 8
  
  I was aware of that limitation.
  
  What didn't cross my mind was the sizeof operator!
  I thought that as long as the result was the same, we don't care about the size, but those operators you mention are a problem.
  I also agree that those use cases are unusual or do not make sense at all, but they are possible.
  
  After having seen that, we should have tests for those operators. I will create some simple tests.
  
  I will have a look at optimizeOpWidth in SDCCopt.c as you suggested.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-06-19
    
    I've been able to avoid the cast quite easily for the xor case with the following patch.
    It was just a missing operand check for the '^' operator.
    
    So the original example becomes:
    
    ; ..\src\cast.c: 14: testVar ^= 8; ; genAssign ld a, _testVar+0 ; genXor xor a, #0x08 ld _testVar+0, a ; ..\src\cast.c: 15: testVar |= 8; ; genAssign ld a, _testVar+0 ; genOr or a, #0x08 ld _testVar+0, a ; genLabel
    
    Now comes the part where I am more or less lost, how to get from this:
    
    ..\src\cast.c(l15:s0:k2:d0:s0:b2) iTemp0 [k3 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} := _testVar [k2 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} ..\src\cast.c(l15:s0:k3:d0:s0:b2) iTemp1 [k4 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} = iTemp0 [k3 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} | 0x8 {unsigned-char literal} ..\src\cast.c(l15:s0:k5:d0:s0:b2) _testVar [k2 lr0:0 so:0]{ ia1 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} := iTemp1 [k4 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed}
    
    To this:
    
    ..\src\cast.c(l21:s0:k2:d0:s0:b2) iTemp0 [k3 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed} = _testVar [k2 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} | 0x8 {unsigned-char literal} ..\src\cast.c(l21:s0:k3:d0:s0:b2) _testVar [k2 lr0:0 so:0]{ ia1 a2p0 re0 rm0 nos0 ru0 dp0}{volatile-unsigned-char fixed} := iTemp0 [k3 lr0:0 so:0]{ ia0 a2p0 re0 rm0 nos0 ru0 dp0}{unsigned-char fixed}
    
    Because now, the results in rotate2...xorLiteral_1_size_8 are back to its original form (less optimized)
    
    Last edit: Visenri 2021-06-20
    
    SDCCopt.c.patch
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Maarten Brock - 2021-06-20
      
      It seems to me that this is work for the backend or even the peephole optimizer. Not every target can perform these operations on any location. It very much depends on whether testVar is global (and if so in which memory space on e.g. mcs51) or on stack or in registers.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Visenri - 2021-06-20
        
        Maybe, but I think we should simplify the AST and iCode as much as possible to simplify and not repeat in each backend optimizations doable in a generic way.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2021-06-20
        
        I've applied your SDCCopt.c.patch in [r12488]
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Visenri - 2021-06-20
        
        This is still a work in progress, but that part was correct for sure.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-06-20
  
  I have bad news, after writing some test, the current results (no patch applied) of sizeof operator seem wrong in many cases (or at least, I think so):
  
  All test with literal values with some operation seem to fail:
  
  "Assertion failed" on (i = sizeof (8 << 1), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:2 "Assertion failed" on (c = sizeof (8 << 1), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:2 "Assertion failed" on (i = sizeof (8 >> 1), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:2 "Assertion failed" on (c = sizeof (8 >> 1), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:2 "Assertion failed" on (i = sizeof (8 & 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:28 "Assertion failed" on (c = sizeof (8 & 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:29 "Assertion failed" on (i = sizeof (8 | 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:31 "Assertion failed" on (c = sizeof (8 | 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:32 "Assertion failed" on (i = sizeof (8 ^ 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:34 "Assertion failed" on (c = sizeof (8 ^ 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:35 "Assertion failed" on (i = sizeof (8 + 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:37 "Assertion failed" on (c = sizeof (8 + 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:38 "Assertion failed" on (i = sizeof (8 - 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:40 "Assertion failed" on (c = sizeof (8 - 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:41 "Assertion failed" on (i = sizeof (8 * 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:43 "Assertion failed" on (c = sizeof (8 * 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:44 "Assertion failed" on (i = sizeof (8 / 8), sizeof (int) == i) at gen/stm8/test_sizeof/test_sizeof.c:46 "Assertion failed" on (c = sizeof (8 / 8), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:47
  
  And that is because "cheapestVal" function will generate a type of V_CHAR for all of them.
  
  Some tests with symbols also fail.
  
  "Assertion failed" on (c = sizeof (c >> 1), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:57 "Assertion failed" on (c = sizeof (c >> (2 + 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:59 "Assertion failed" on (c = sizeof (c >> (2 >> 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:61 "Assertion failed" on (c = sizeof (c & (2 + 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:66 "Assertion failed" on (c = sizeof (c & (2 >> 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:68 "Assertion failed" on (c = sizeof (c | (2 + 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:73 "Assertion failed" on (c = sizeof (c | (2 >> 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:75 "Assertion failed" on (c = sizeof (c ^ (2 + 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:80 "Assertion failed" on (c = sizeof (c ^ (2 >> 1)), sizeof (int) == c) at gen/stm8/test_sizeof/test_sizeof.c:82
  
  Because the end result is of type char, and the expression is simplified to char type.
  Only happens if the outermost operation is a right shift or a bitwise operation.
  
  Test for 'host' do not have a single failure, other backends show exactly the same result (of course, the problem is in the AST).
  
  Either we have to do a lot more work to make this work correctly or leave those cases as undefined behavior. (Or maybe I am missing something)
  
  I haven't seen any explicit mention of use cases (for example in C99 specification: "ISO/IEC 9899:1999") of sizeof for:
  
  expressions containing only literal values.
  
  expressions containing literal values combined with literal values.
  
  .
  You have to gather that from other text what would be the expected behavior for such an unusual use case:
  Integer promotion must be applied, so result should be of type int, so sizeof must return sizeof(int) when used in such expressions.
  
  But I am wondering:
  
  Are we trying to solve a non-existing problem?
  
  Is there any practical use for those expressions? I have never seen any of those in real applications.
  
  Last edit: Visenri 2021-06-20
  
  results-test-host.log
  
  results-test-stm8.log
  
  test_sizeof.c
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-20

I am working on a patch that changes decoratetype to not reduce operands when the tree starts with a "SIZEOF" operator or other problematic operators.
So, it should be no problem to do optimizations in size for the rest of trees (the vast majority of real cases).
So far it seems promising, far less tests fail, I am checking what's going wrong with the rest.

Last edit: Visenri 2021-06-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Visenri - 2021-06-20
  
  I have accomplished 0 failures (stm8 tested for now) and full optimization modifying the tree decoration process.
  
  I am passing the value RESULT_TYPE_OTHER to resultType input of decorateType to indicate that the result type is unknown and it should use standard promotion rules without reducing types.
  The first node using SIZEOF is the one that starts the process using RESULT_TYPE_OTHER, and as a result, setting resultTypeProp to the same value and one flag inside decorateType to avoid all problematic code paths.
  
  Functions like:
  valUnaryPM, valComplement, valNot, valMult, valDiv, valMod, valPlus, valMinus, valShift also get an extra parameter to avoid call to cheapestVal function.
  
  Does anyone see a problem with this method?
  Could this cause any issues in other parts of the code?
  For now it seems to be working ok.
  
  I want to write some more test to check all the operators, and test all other backends, I don't expect any major problem.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Visenri - 2021-06-20
    
    The results for now are these:
    
    13 tests get better size and cycles for a total of 1385 saved bytes (1200 correspond to rotate2 tests).
    Only 4 tests have an increased size, for a total of 14 bytes (3.5 average).
    I will investigate those tests (but I am quite confident that they have to do with the RESULT_TYPE_OTHER used in other parts of the code).
    
    If we want to avoid those cases, maybe I will need to add another parameter to indicate this cases and avoid confusion with normal RESULT_TYPE_OTHER use cases.
    
    I did it this way, because it was less involved than modifying all calls to decorateType and also reviewing the parameters to each function using the resultTypeProp or resultType values.
    
    Should be doable, just more time-consuming.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Visenri - 2021-06-21
      
      After confirming that the problem with those 4 cases was the use of RESULT_TYPE_OTHER (as expected), I added a boolean parameter to specifically disable size reduction for SIZEOF trees.
      
      Now 11 tests get better size and cycles with a total reduction of 1398 bytes (1200 from rotate2 tests as before).
      No tests with increased code size or cycles.
      
      As soon as I test all the operators and backends I'll upload the patch here so other developers can have a look at it, if no one has objections I'll apply the patches.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Benedikt Freisen - 2021-06-21
        
        What will happen when you take an expression with implicitly narrowed type and assign it to a variable of automatic type?
        
        Last edit: Benedikt Freisen 2021-06-21
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Visenri - 2021-06-21
        
        Does SDCC support auto variables (introduced in C++11)?
        
        You meant something like this:
        
        auto autoVar = 1; auto autoVarWithOp = 1 + 1; ASSERT(sizeof (autoVar) == sizeof (int)); ASSERT(sizeof (autoVarWithOp) == sizeof (int));
        
        The SDCC documentation says nothing about this feature, and it gives me an error (as expected):
        
        error 226: no type specifier for 'autoVar'
        
        Compiles fine in gcc and executes tests correctly in host.
        
        So, if this is what you meant, it's not applicable right now to SDCC.
        
        Not a concern for me in the foreseeable future of a compiler targeting small micro-controllers (Maybe Philipp thinks otherwise).
        
        What I can say right now is that this patch may benefit this kind of feature if we also include the auto initialization tree in the "disable size reduction" cases. It should just work.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2021-06-22
        
        SDCC does not yet support this use of auto. And while the wording is not yet in C23, it seems likely that it will be (see the N2735 proposal for details). So, like typeof, it is something SDCC will have to support in the future.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2021-06-22
        
        Wow, does auto get a new meaning in C after 50 years?
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2021-06-23
        
        Unlike C++, the new meaning won't replace the old one, but be an additional one, i.e. both of the following declarations will be valid C:
        
        void f(void) { auto long i = 23; // long since C90 auto j = 42l; // long since C23 }
        
        Last edit: Maarten Brock 2021-06-23
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Maarten Brock - 2021-06-23
        
        Sorry for hijacking this thread.
        
        But wasn't that already valid with a different meaning?
        
        void f(void) { auto j = 42L; // int since K&R C, 1978 }
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Philipp Klaus Krause - 2021-06-23
        
        Yes, it had that meaning until C94, but from C99 to C17 it was a constraint violation.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Visenri - 2021-06-22
        
        After adding extra tests, only minor changes were needed to allow almost all operators to be OK.
        
        Aside from adding the extra parameter to decorateType (and passing it to valXX functions) only minimal changes were needed in these operators:
        
        &
        
        |
        
        ^
        
        unary+
        
        LEFT_OP
        
        RIGHT_OP
        
        So far, it passes all tests (160) involving many (almost all) operators combined with pure literals and literals with values.
        
        The only tests that fail are boolean operators, I have to check them, just handle them as integers:
        
        !
        
        all comparisons (6)
        
        && ||
        
        I attach the patch files with major changes, so some of you can have a look to see if I am missing something.
        I also attach the preliminary tests (The vast majority of these cases fail with current trunk version).
        
        DON'T commit, this is still a work in progress. Just in case someone wants to have a look
        
        SDCCast.c.patch
        
        SDCCval.c.patch
        
        test_sizeof.c
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-22

summary: OR - XOR cast optimization --> OR - XOR cast optimization - CONDITIONAL type reduction in AST

Group: -->
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Visenri - 2021-06-22

Results after testing other backends:

Regarding xor optimization:

Most (if not all) seem to benefit (size and cycles) from the OR-XOR optimization (similar to STM8).
pdk fails because it generates invalid code:

; gen/pdk14/bug2686159/bug2686159.c: 31: REG_2 |= 1; ; genOr mov a, #0x01 or _REG_2+0, a ; gen/pdk14/bug2686159/bug2686159.c: 32: REG_2 |= 2; ; genOr mov a, #0x02 or _REG_2+0, a

It was like this before the patch:

; gen/pdk14/bug2686159/bug2686159.c: 31: REG_2 |= 1; ; genAssign mov a, _REG_2+0 ; genOr or a, #0x01 mov _REG_2+0, a ; gen/pdk14/bug2686159/bug2686159.c: 32: REG_2 |= 2; ; genAssign mov a, _REG_2+0 ; genOr or a, #0x02 mov _REG_2+0, a

**
Philip, can you shed some light on this?
I'm not familiar with the instruction set, but I suppose doing an or with a destination other than "a" is not possible.
Could you fix it in the pdk/gen.c, or should I disable this optimization for pdk?
(you can see the generated iCode some posts ago).

I've added the patches with my changes, so, you can have a look.
If you want to test the OR-XOR optimization with pdk, you just have to copy the code after the comment:

/* OR / XOR char with literal integral, try to reduce integral to CHAR if it fits in a CHAR */

Last edit: Visenri 2021-06-23

SDCCast.c.patch

SDCCval.c.patch

SDCCval.h.patch
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

OR - XOR cast optimization - CONDITIONAL type reduction in AST

The Small Device C Compiler (SDCC), targeting 8-bit architectures

Group

Searches

Help

#392 OR - XOR cast optimization - CONDITIONAL type reduction in AST

Related

Discussion

Related