I have few peephole rules.
Of course they are full of bugs and should be well reviewed :)
At least they are checked in my code.
Please feel free to give feedback.
If you want, that $for(inst in ['t0sn','t1sn'] ) can be emulated directly in peephole syntax by making the instruction a variable, e.g. %6, and adding a condition like if same(%6 't0sn' 't1sn').
P.S.: Using the singleSetBit operator for in the immdInRange condition, even the $for(i in range(8)) part could be emulated in peephole syntax (see e.g. src/stm8/peeph.def for similar single-bit.mask peephole rules). So I guess this can all be just one or two peephole optimizer rules?
👍
1
Last edit: Philipp Klaus Krause 2023-12-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Correct. I made a prototype that works and doesn't require recompilation.
It can also fit nicely into effectively hard-coded C code.
I was a little confused by the fact that Same() accepts more than one pattern. Perhaps it would be more appropriate to call it Any() or Subset() for the "one or more" case?
PS: Sorry, I personally find it very unpleasant to use immdInRange side effects. I'm not ready to pay logic and simplicity for compactness.
Last edit: Konstantin Kim 2023-12-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Those functions got added one at a time, and while I think a few rare renames to make them more consistent happened the names are still not perfect. AFAIR, notSame was there first, then same got added. But any or subset don't looks intuitive to me either.
Maybe anySame (for same) and notAnySame (for notSame)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This way we can introduce new recommended aliases and trigger a warning about the deprecation of old spelling.
This is a purely technical problem of how to implement this. It’s not yet clear yet what and why ;)
Last edit: Konstantin Kim 2023-12-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Don't know if its worth having a deprecation warning: the .def files in SDCC would be updated, and few users have their own .def files, so the error might be good enough.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks. IMO we still should make two small changes before this goes in:
1) The comment stating that the rule got applied should go somewhere near the changed lines, not just at the top, and should fit into the current numbering, so e.g. instead of
2) It looks like the none of the these rules is currently triggered in the regression tests. Can you provide a small C code sample function, where the rules apply?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Fine. I'm leaning towards the option of having a comment at the beginning of the matched block for debugging reasons. But if you prefer, it will be adjusted to your proposal.
All the rules were used and triggered in mine and in many random codes on the net.
You are right. Rules should be tested and evaluated.
Files in the first message was updated.
Last edit: Konstantin Kim 2023-12-29
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The regression tests have quite good coverage of standard C code, but extensions such as I/O are not covered that well. So it is not surprising that these rules for mov.io are not currently triggered, and we need to add a test.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hmm. I refined that test a bit (see attached source)¹, but it still doesn't trigger your rules: I don't see SDCC generate the t0sn/t1sn your rules need.
¹ I was lazy here, when parametrizing the test - I guess I could have used a proper loop instead using m4 instead - we don't have a preprocessor for peephole optimizer rules, but we do have for tests.
I don't like rules #37 to #40. I think every peephole rule should either decrease code size, or keep code size and decrease cycle count.
We don't want to run into a cycle of peephole rules applications that just keep reordering code without converging.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
While I get the purpose of bringing asm code into a canonical form to open it up to optimization by further peephole rule applications, I still think that having rules that decrease neither code size nor cycle count will be causing more trouble in the long-term, so I'd prefer to do without them, even if it means requiring more rules in total.
Many years ago, there were such rules for z80. z80 has a large number of rules due to many developers having improved it over a long time (the z80 port is much older than the pdk ports). AFAIR, z80 peephole rules not triggered in the regression tests have been removed from SDCC. Still there is a lot of rules left, and it is hard to grasp their potential interactions. For z80, it happened that at some time it as possible to trigger an endless cycle of rule applications; it didn't happen in the regression test, but users reported bugs about SDCC hanging.
The monotonicity of every rule decreasing code size or keeping code size and decreasing cycle count is a relatively elegant way to avoid that problem.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Agree. Nobody likes useless cold code that never runs.
This is not our case.
I have formed a good set of rules that were obtained iteratively by shooting non-optimal sequences of the real output of the compiler.
The overlap of logic is quite expected, but not cold rules.
Still, thanks for the critical point of view.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
maybe it's better to split the patch into smaller parts?
Yes. But before [bugs:#3525] is fixed for pdk, I'll be reluctant to add new peephole rules for pdk.
Related
Bugs:
#3525Last edit: Maarten Brock 2023-12-29
Lets go one by one.
This one is pyexpander script generated for and/or to setX.io reduction
Last edit: Konstantin Kim 2023-12-29
If you want, that
$for(inst in ['t0sn','t1sn'] )can be emulated directly in peephole syntax by making the instruction a variable, e.g.%6, and adding a condition likeif same(%6 't0sn' 't1sn').P.S.: Using the singleSetBit operator for in the immdInRange condition, even the
$for(i in range(8))part could be emulated in peephole syntax (see e.g. src/stm8/peeph.def for similar single-bit.mask peephole rules). So I guess this can all be just one or two peephole optimizer rules?Last edit: Philipp Klaus Krause 2023-12-28
Correct. I made a prototype that works and doesn't require recompilation.
It can also fit nicely into effectively hard-coded C code.
I was a little confused by the fact that Same() accepts more than one pattern. Perhaps it would be more appropriate to call it Any() or Subset() for the "one or more" case?
PS: Sorry, I personally find it very unpleasant to use immdInRange side effects. I'm not ready to pay logic and simplicity for compactness.
Last edit: Konstantin Kim 2023-12-29
Those functions got added one at a time, and while I think a few rare renames to make them more consistent happened the names are still not perfect. AFAIR,
notSamewas there first, thensamegot added. Butanyorsubsetdon't looks intuitive to me either.Maybe
anySame(forsame) andnotAnySame(fornotSame)?Maybe
equalsAnyandequalsNone?Or
isOneOfandisNoneOf?In any case, we should keep in mind that users might have their own auxiliary rule files and therefore still accept the old spelling.
This way we can introduce new recommended aliases and trigger a warning about the deprecation of old spelling.
This is a purely technical problem of how to implement this. It’s not yet clear yet what and why ;)
Last edit: Konstantin Kim 2023-12-29
Don't know if its worth having a deprecation warning: the .def files in SDCC would be updated, and few users have their own .def files, so the error might be good enough.
IMO,
isNoneOfdoesn't sufficiently convey thatnotSametests pairwise.My point to not reinvent the wheels ;)
I like the classic (for some languages) all()/any() for list of booleans.
Last edit: Konstantin Kim 2023-12-29
Thanks. IMO we still should make two small changes before this goes in:
1) The comment stating that the rule got applied should go somewhere near the changed lines, not just at the top, and should fit into the current numbering, so e.g. instead of
We'd have
2) It looks like the none of the these rules is currently triggered in the regression tests. Can you provide a small C code sample function, where the rules apply?
Fine. I'm leaning towards the option of having a comment at the beginning of the matched block for debugging reasons. But if you prefer, it will be adjusted to your proposal.
All the rules were used and triggered in mine and in many random codes on the net.
You are right. Rules should be tested and evaluated.
Files in the first message was updated.
Last edit: Konstantin Kim 2023-12-29
The regression tests have quite good coverage of standard C code, but extensions such as I/O are not covered that well. So it is not surprising that these rules for mov.io are not currently triggered, and we need to add a test.
sample code:
it requires another rule for "c(n)eqsn/and -> tXsn" to trigger
(snippet was edited)
Last edit: Konstantin Kim 2023-12-29
Hmm. I refined that test a bit (see attached source)¹, but it still doesn't trigger your rules: I don't see SDCC generate the t0sn/t1sn your rules need.
¹ I was lazy here, when parametrizing the test - I guess I could have used a proper loop instead using m4 instead - we don't have a preprocessor for peephole optimizer rules, but we do have for tests.
you have to check next bunch of rules for "c(n)eqsn/and -> tXsn"
sorry, my bad
have to be:
At same time I noted that 4.4.0 generates a bit different code patterns than before ;(
Another rules for t0sn/t1sn flow untwisting
Bassicaly these are 4 rules that have been processed into 40.
Last edit: Konstantin Kim 2023-12-29
I don't like rules #37 to #40. I think every peephole rule should either decrease code size, or keep code size and decrease cycle count.
We don't want to run into a cycle of peephole rules applications that just keep reordering code without converging.
You don't like them yet ;)
Let's put them aside for now. They are helpers for others, and cause a very good avalanche effect.
While I get the purpose of bringing asm code into a canonical form to open it up to optimization by further peephole rule applications, I still think that having rules that decrease neither code size nor cycle count will be causing more trouble in the long-term, so I'd prefer to do without them, even if it means requiring more rules in total.
Many years ago, there were such rules for z80. z80 has a large number of rules due to many developers having improved it over a long time (the z80 port is much older than the pdk ports). AFAIR, z80 peephole rules not triggered in the regression tests have been removed from SDCC. Still there is a lot of rules left, and it is hard to grasp their potential interactions. For z80, it happened that at some time it as possible to trigger an endless cycle of rule applications; it didn't happen in the regression test, but users reported bugs about SDCC hanging.
The monotonicity of every rule decreasing code size or keeping code size and decreasing cycle count is a relatively elegant way to avoid that problem.
Agree. Nobody likes useless cold code that never runs.
This is not our case.
I have formed a good set of rules that were obtained iteratively by shooting non-optimal sequences of the real output of the compiler.
The overlap of logic is quite expected, but not cold rules.
Still, thanks for the critical point of view.
Rules 37-40 was triggers well in 4.2.9
At the moment I can't reproduce the trigger case in 4.4.0 #14549 for them.
Let me hunt them in the meantime
rules #37 to #40 equivalent seems is already in place!!
4.4.0 #14549 shows "; peephole j3 removed goto by inverting test condition."