Hello.
I was playing with the "new" STM8 port and found a bug in peephole rule interpreter, a simple code like this failed to be optimized:
PB_ODR ^= 0x20;
if(PC_ODR)
{
...
}
Pre-patch version generated:
; ../src/blinky.c: 331: PB_ODR ^= 0x20;
ld a, 0x5005
xor a, #0x20
ld 0x5005, a
; ../src/blinky.c: 338: if(PC_ODR)
ld a, 0x500a
Patch version generates:
; ../src/blinky.c: 331: PB_ODR ^= 0x20;
bcpl 20485, #5
; ../src/blinky.c: 338: if(PC_ODR)
ld a, 0x500a
The problem is in "argCont" function (in src/stm8/peep.c), it was failing to execute correctly the following rule:
replace restart {
ld a, %1
xor a, #0x20
ld %1, a
} by {
bcpl %2, #5
; peephole 21-5 replaced or by bcpl.
} if notUsed('a'), operandsLiteral(%1), immdInRange(0 65535 '+' 0 %1 %2)
Because 'a' was being evaluated as used, because the instruction " ld a, 0x500a" contained 'a' in the source operand (0x500a). Looking at the code even other ways of failing were possible, for example, using symbols with x or y in its name using direct or indexed mode (ld a, _testx+0) would incorrectly evaluate x or y as being used .
New proposed code checks for direct memory and indirect addressing in a more reliable way, and searches explicitly for the argument after the "," if it detects the correct addressing mode.
I think every case should work ok now, but not fully tested.
Do the regression tests (i.e. make test-stm8 and make test-stm8-large in support/regression) pass?
Philipp
The version I've compiled from original source code (no patch applied): 3.8.4 #10780 (MINGW64) ,doesn't pass all the regression tests.
Is this normal?.
Did anyone check this?
Disabling peephole optimization, we get some timeouts and some invalid instructions:
(In makefile: SDCCFLAGS += --fverbose-asm -DNO_VARARGS --no-peep)
With peephole enabled (default Makefile flags) many assertions fail, and also invalid instructions and timeouts:
The outputs shown are not the full list of errors, just a sample of the full list.
Sorry, I messed it up somehow.
I've build it from scratch and now it seems to pass the test with unpatched version.
I've found a bug in my patch, I'll upload it once it passes the regression tests.
Last edit: Visenri 2019-01-03
stm8/peep.c
Fixed the problem with argCont to handle the original "notUsed()" rule flaw.
Correct processing of skip instructions:
There are some rules in peeph.def to replace "jra" jumps with skip instruction (real or emulated), using ".byte" instructions.
These were not processed at all by the call stack:
-scan4op
--stmUncondJump
--findLabel
These rules are at the end of the file, as the commend states: "so nothing else ever sees the jump-on-false optimization"
When custom peep rules were added with --peep-file, some rules were not processed correctly when ".byte" instructions were find by "scan4op".
This has been corrected and skip instructions are processed like any other jump.
Some hacks are needed, because the jump label may no longer exist, but it works.
4 Opcodes are processed:
Better handling of "ld" and "ldw" instructions in "stm8SurelyWritesFlag"
As was indicated by a comment, this needed some work: // Todo: Improve accuracy of these two.
This was also preventing optimizations.
With this patch, these are handled as indicated by stm8 datasheet "PM0044"
stm8/peeph.def:
Conditons to replace skip instructions have been changed, because wrong order was decreasing labelRefCount without executing replacement.
This leaded to symbol not found sometimes, because label was removed, but jra was not replaced by skip
changed to:
So, "labelRefCountChange" gets executed only if other conditions are fullfilled
Opcodes for two byte skip changed from 0xc1 (cp) to 0xc5 (bcp), original opcode changes more flags (n, z, c & v) vs (n & z).
These extra flags were not handled by the conditions in the rules.
Side effects should be the same, the instruction format for the address (following 2 bytes) is the same and result is lost in both instructions.
Attached: the patches and modified files.
All regression tests are passed.
Thanks. I applied the rule part of the patch in [r10850]. Regarding the changes to peep.c, I wonder
1) how the new code works on indirect addressing modes.
2) what happens when stm8instructionSize() overestimates size. Will findLabel() err on the safe side? Might it return the wrong label?
Indirect adreesing in stm8 is mostly useless, so it is notused much by sdcc, but there are exceptions:
Philipp
1) It shold work as correct or incorrect as before.
Following the code, i think there are also some problems to be addressed here.
Ill try to hande this cases in the same way as indexed access.
Sorry for the missing handled cases, i'm new to STM8 and to SDCC code base.
2) For the cases used in the current peep.def file:
Analyzing the code in stm8instructionSize, shows that there is only one incorrect case
stm8instructionSize should be corrected at least for this case.
Anyway, the consequences are overall better or similar than pre-patched version, because in pre-patched version, all .byte instructions were just skipped by all checks in "scan4op", so, the next instruction (the one[s] that should be skipped) was being processed incorrectly as the next to execute.
With the patch, at least, with the cases handled correctly by stm8instructionSize, the code takes the correct next instruction to be processed.
If stm8instructionSize repports the size incorrectly, it can be better, worse or equal to the original behavior, it depends on the conditions indicated in the replace rule and the skipped instruction(s) and the few following instructions.
So: after fixing stm8instructionSize at least for this case, some extra measures could be implemented to avoid cases like this:
I'll try it to see if it works.
Your use of looking for the target of an implicit jump is the first use of stm8instructionSize() that might rely on exact results. Previously, overestimation of instruction size was always safe (i.e. some optimizations might be missed, but the generated code would still be correct).
I'm not sure stm8instructionSize() can be made exact easily (considering inline asm), so when stm8instructionSize() reports a result larger than the exact one, findLabel() can fail to find the label, but should never return the wrong label.
Philipp
I agree with you when you say that if stm8instructionSize is not 100% reliable, we may not get the correct label if we removed the label.
But what I say is: this is better than not handling skip instructions at all (pre-patch state), you say that at least the generated code would still be correct, and this is not right, I'll show you an example showing optimized code when it should not be optimized:
The rule that applies here is (from peep.def):
I would also say here that this rule should be:
(because instructions affect 'n' and 'z')
Pre-patch-version:
C code (assembly used to force the problematic case):
With no optimization (OK):
With Optimization: Generates possibly faulty code because 'a' is being used by 'exg' instruction):
This is demonstrated by removing the jump and skipped instruction:
Showing clearly that if exg intruction follows after the pattern, the rule is correcly executed.
Patched version:
All previous cases result in no optimization because code path is followed correcly, indicating that no optimization is possible because a is being used:
New extra measures in the patched version
I already tried to change the skip rules conditions from:
to
And skip instruction labels are retained as shown in this example:
No optimizations enabled:
Optimized:
We can even go further, leaving the original label as a comment in the .byte instruction, to do an extra safety check in findlabel function (I know it's a dirty hack).
So:
I think this should be more than safe enough even if stm8instructionSize is not 100% reliable.
Last edit: Visenri 2019-01-10
After a lot of testing here is my patch.
It affects 3 files, but all can be applied separatelly.
stm8/Peep.c
Fixed the problem with the original "notUsed()" rule flaw.
While fixing it, I've made quite modifications to improve reliability:
-Added case insensitive processing of assembly instructions, now 'a' matches 'A', no more errors with inline asm in uppercase.
-Added space/tab insensitive processing of assembly instructions, now " a " matches "a", no more errors with inline ams or rules with extra spaces.
-Added checks to make processing insensitive to comments after instructions, no more false matches with inline asm or rules with comments .
-Changed strcmp by STRCASECMP
-Changed strncmp by STRNCASECMP (except for label check)
-Added functions to help with string processing.
-Indirect addressing should be no problem, the only function that takes it into account is 'IsPtr'.
Performance & readability:
-Changed macro ISINST by EQUALS where possible.
-Changed !strcmp calls by EQUALS macro where possible.
-New macro STARTSINST to check for instructions "starting with", avoiding the previous error prone strncmp with hardcoded length
-Changed to 'bool' return type some functions that were unnecessarily 'int'.
"lineReads" , "nextToken" & "srcArgEqualsStr"
Nested parenthesis (round brakets) are handled by a counter, in my tests, arguments are detected always correctly.
Previously, cases like this were not handled correctly by some functions:
"stm8SkipUncondJump"
Processing of skip instructions:
Now, skip instructions are processed like any other jump.
4 Opcodes are processed:
**"findLabel" **
To make it completelly safe with skip intructions all these changes for extra checks are made:
"stm8MightRead"
'cpw' check both operands, because both are read (previously, only dst operand was checked)
This was not causing problems because the compiler seems to never generate instructions that read from two index registers, like:
"stm8SurelyWritesFlag"
Now checks writes to 'c' flag.
As a result, for example, Peephole 42,43,44,45,46,47,48 & 49 are applied more times:
; removed redundant load of a from ....
; removed redundant load of x from ....
After applying my changes to peep.def, checking 'c' flags makes no difference at all in these rules
"scan4op"
Because stm8SurelyWritesFlag now checks for 'c' flag and some cond jumps (btjf, btjt) write 'c':
'stm8SurelyWrites' is called before jump checks.
**"stm8instructionSize" **
All instructions have been revised, and I think all should have the right size now.
"ldw x, y" is evaluated now as 1 byte in size:
As a result more jump optimizations are applied:
; peephole j5 changed absolute to relative unconditional jump.
; peephole j5a changed absolute to relative unconditional jump.
; peephole j7 removed jra by using inverse jump logic.
Last edit: Visenri 2019-02-17
stm8/peep.def
Changed rules 42,43,44,45,46,47,48,49 & 50
from:
} if notVolatile(%1), notUsed('c')
to:
} if notVolatile(%1), notUsed('n'), notUsed('z')
In these rules we have this pattern:
So removing last ld we only have to check that 'n' & 'z' are not used.
Corrected also some typos showing 'a' in comment where it should be 'x'
Created new rules (51eq & 51ne) to remove redundant load in sequences like:
Changed rule name of rules 51, 52 & 53 to better names
Added rules to remove redundant jumps (j2d-eq & j2d-ne) (found in regression test 'logic')
**Skip rules (jrf1 to jrf11): **
Poisoned instructions with comment to make sure they are not replaced by user rules.
To track the skip instruction label (see findLabel):
Removed the labelRefCountChange(%5 -1).
Added an special comment after the skip instruction.
Last edit: Visenri 2019-02-17
I attach files for stmu8/peep.c and stm8/peep.def (full files & diff)
All diffs and tests are done with r10972.
(The page didn't allow me to load in the previous messages).
SDCCgen.c
I also did a simple modification to this file to allow inline assembly to be processed correctly by peep.c when labels are present.
It formats the label properly and sets the "isLabel" field.
Finally, the test results.
I've already picked a few simple improvements from the patch into the post-3.9 branch in [r11162], and see a (small) improvement in code size. Thanks.
Regarding the other parts:
Ok, next time I'll use this style.
In the original file, most of the booleans were in uppercase, so I just followed the same style.
This has been tested and is working properly, in fact, most of my changes are focusen in avoiding this kind of problems (extra spaces, tabs, case sensitivity).
If you look carefully at the helper functions, all the magic is done there. In the case you are talking about, "nextToken" is handling all the extra chars and returned strings are always trimmed correcly (no leading/trailing spaces, no commas and nested round brackets are handled correctly).
Poisoning is needed because if additional rules are specified with "--peep-file" option, the unpoisoned instructions could be removed/replaced by:
I think that current behavior of reappliying rules in Peep.h.def is correct and desirable, because after applying additional rules, we may get extra optimizations by original rules.
In [r11331], I picked some more of the improvments from your patches.
Philipp
I have created an updated version.
stm8/peep.c:
Adopted updated changes from newest version 12014
Updated format to requested conventions (bools in lowercase and other things).
Added handling of 0+ format before literals / compiler symbols, 0+ is skipped to find the real compiler symbol or literal.
New function (stm8SkipUncondJumpSurelyWrites) to check written flags and regs (only a) by skip instructions.
"findLabel" function modified to return the next instruction after skip instructions (normal or emulated).
Only if skipped bytes matches the length of the skipped instruction.
Added missing 'bccm' instruction to function "stm8MightReadFlag".
Changed "stm8MightRead", exg is checked in the first group, because it always reads 'a' reg, no need to check argument.
Added call to "stm8SkipUncondJumpSurelyWrites" in "stm8SurelyWritesFlag".
Added call to "stm8SkipUncondJumpSurelyWrites" in "stm8SurelyWrites".
Added function "stm8IcodeIsCall" to check if iCode is a CALL.
Function "scan4op":
Added new comments about checking writes before jumps.
Added checks to detect tail optimization, otherwise, we ended in abort, and optimizations were missing.
Attached, the new file and the diff with current version.
Thanks. I'll have another look soon.
One issue I noted: The patch replaces ISINST by EQUALS in the function that computes size. This seems incorrect for e.g. callr. callr typically has an argument, but AFAIK, EQUALS will return false then.
P.S.: Also, some of the new functions use FALSE/TRUE instead of false/true.
I noticed that the new functions try to handle the case of an empty or invalid what argument. Should they do that? Would it be better to use wassert?
Last edit: Philipp Klaus Krause 2021-01-25
Have you tested it?
It parses all instructions correctly.
As I said last time:
About the TRUE-FALSE: You're right, I changed many of them, but some remained in the previous changes, would it be OK for you if I change all instances to lowercase?
About checking empty or invalid arguments: could you tell me exactly the lines you are looking at?.
In every line of code that you patch adds or changes, there should be no TRUE or FALSE.
Regarding the checking, I meant this part of the patch (and similar code for other functions):
Last edit: Philipp Klaus Krause 2021-01-26
That is a matter of taste.
As I designed it, the purpose of that function is to provide an answer to the question "Are two arguments strings different to nullstring equal?".
If one of them is NULL, the result is defined by convention to FALSE.
If one of them is nullstring, the result is defined by definition to FALSE.
It is up to the calling function to check previously if these kind conditions make sense or not in that particular case.
Previous code was not checking explicitly for the lack of parameters, this is not the purpose of this file, that kind of errors will be picked by the assembler.
Anyway, AFAIK these conditions are not triggered at the calling points.
Last edit: Visenri 2021-01-26
There are many examples of checks with assertions before calling argEqualsStr or argEqualsChr, for example in stm8instructionSize:
Last edit: Visenri 2021-01-26
Looks like there was a misunderstanding:
I meant that there should be no TRUE / FALSE in new code in SDCC, as the current convention is to use true / false. The old-style TRUE / FALSE still exist in places, but when code gets added or changed, it should use true / false instead.
P.S.: Could you contact me by email? My address is in the SDCC Changelog.
Last edit: Philipp Klaus Krause 2021-06-03