I've been working again with STM8 SDCC compiler and I found that many things could be improved with more peephole rules.
I have prepared a patch with many new rules and some changes to some code to get better results.
I put bitfields in the tittle, but the savings are in many more places.
For the medium memory model this is the comparison pre-post patch:
Summary for 'stm8': 0 failures, 12335 tests, 2385 test cases, 2532879 bytes, 78765777 tick
Summary for 'stm8': 0 failures, 12335 tests, 2385 test cases, 2532143 bytes, 78691548 ticks
Bitfields is where the improvement is more noticeable:
bitfields (f: 0, t: 131, c: 8, b: 8003, T: 11182)
bitfields (f: 0, t: 131, c: 8, b: 7720, T: 10870)
-3.5% in size.
But the number is not what i think is important, now all bitfield operations are optimized as they should, so programs handling bitfileds in regs in a lot of different ways will benefit a lot. This also makes a lot of operations atomic.
The rules are also splitted to be as small as possible, so a minimal set of rules performs a lot of optimizations, this is needed because a lot of bitfield operations are done in a similar way, but with small variations (extra srl, or swap). Take a look at the rules to see what I mean.
On Thu, 10 Jun 2021, Philipp Klaus Krause wrote:
Dear Philipp and Visenri,
uCsim is an ISS (Instruction Set Simulator) where cycle accuracy is
not a key point. As you noted, it is hard to correctly implement it
for pipeline CPUs. Cycle count in uCsim is just an aproximation. I'm
afraid it will never match to real hw exactly.
Some time ago I introduced the "so called" virtual ticks. Result of
them is printed on output of "conf" command:
"Operation since last reset" here is sum of fetches, mem reads and
writes. In my optinion it measures sw efficiency somehow. And makes
different CPUs comparable.
Daniel
I've opened a ticket: [#3250]
Please, follow any further discussion there.
I have summarized your comments there.
Related
Bugs:
#3250Pack 3 is added in [r12418], with some whitespace fixes.
Nice to see most of the rules added.
Regarding rules 600a & 600b:
I checked them and I think it was just a typo and those rules were meant to check 'c' not 'n':
If 0 is subtracted from any number the result is the same number, so flags 'n' and 'z' will be the same that after the 'ld'.
Only 'c' may be different after removing the 'cp' / 'sub', because it's updated with the carry/borrow of the operation.
Anyway, I'm just shocked that you get some regression failures and I didn't get any. Could you tell me which test(s) failed?
I will run now the tests with the last commits (12419). Later I will add these rules and I'll repeat the tests.
Last edit: Visenri 2021-06-03
With this changes it also fails.
constantRange.asm fails:
This:
Translates into:
When it should not.
Maybe I'm not seeing something, but to me, Rule '0a' is being applied when it should not ('jrsle' uses 'n' & 'z').
notUsed('v') may be needed and this is not implemented!
By changing the macro D_ in peep.c, one can enable debug output. That could tell where a check for n being read goes wrong.
I guess we'll also want to add support for notUsed('v') sometime, but that's probably beyond the scope of this patch.
P.S.: Could you contact me via email? My address is in sdcc/Changelog.
I am well aware of the D_ macro and other debugging techniques.
I needed to reduce the test to the minimum code triggering the problem and later start a debug session, yesterday it was too late for this even for me (I'm a night owl).
I just wanted you or anyone else to know about the wrong behavior, Just in case anyone wanted to have a look.
About the email. Of course, I'll send you an email right now.
After testing it with a minimal example, I think I found the problem.
There is a typo in stm8/peep.c in function "stm8MightReadFlag":
should be
The other similar instruction is "jrsge" and is checked correctly.
I'll pass all the test before celebrating it too much :D.
Success!
All tests pass, and I checked the modified rules are being applied 37 times.
Here you have the patches.
Looks good to me. I suggest you go ahead and apply it.
You might also want to still look into rule pack 2, and if it should replace some of the current rules.
Applied in [r12425]
I took my time to avoid screwing up things in my first commit :D.
I'll check rule pack 2 next week.
I have applied rule pack 2 and other improvements in [r12449].
With the changes I've done rules like 120 & 121 are no longer necessary, so now the state of this patch is:
Last edit: Visenri 2021-06-12
Pay attention to the list of packs done, I changed it, because pack numbers were wrong.
Regargind rules not applied from pack 4.
There is still room for improvement by applying rules from that pack, you can see it in examples found in several tests.
Here an example from bitfields-bits1_preBits_0_pattern_0_varType_3.c:
This "monstruosity" is simplified gracefully with rules from pack 4, here you can see an example from previous tests I did:
4 different rules are applied to finally being able to apply other rules.
The main problem are these two instructions:
That get in the way of applying rules 160, 161 and later all the others.
The job of rules 4xx, as clearly illustrated by this example, was to allow ldw and inw present many times in similar situations to go up until other rules can be applied to 'a' instructions.
Getting the same kind of optimization with specific rules for every case would require an insane number of extra rules.
Last edit: Visenri 2021-06-13
Fixed comment of rule 406, was showing 'addw' because of a copy-paste error.
What is the current situation? Has everything that makes sense been applied (so we could close the ticket)? Are there parts left, that we might to apply, for which I should do some testing?
As I said some months ago, after [r12449]:
...the state of this patch is:
As I said, there is margin for improvement, but I think all that could be applied safely has been applied.
Even thought there are potential problems (endless loops)) with with pure reordering rules (pack 4). I've never seen them in all my testing, because no other rule seems to undo the changes done by rules of pack 4.
If we put pack 4 aside, this ticket should be closed.
Then I suggest to close this ticket. Since you're the owner, that would be done by you. If we want to brink back an equivalent of pack4, we could do so in a new ticket. Maybe we can even achieve the effect without having rules that do not reduce code size.
Ok, I'm closing it.