Here is a patch with a new peephole optimiser constraint function: isImmdBitMask.
This function determines whether an immediate value represents a bitmask with a single bit either set (e.g. 00010000
) or clear (e.g. 11111011
), and whose value is no wider than a given bit width (up to 32 bits). The function sets an output variable to the zero-based index of the single set/clear bit.
Some examples of the logic:
%1 = 0x4 => isImmdBitMask(%1 8 %2) => true, %2 = 2
%1 = 0xBF => isImmdBitMask(%1 8 %5) => true, %5 = 6
%2 = 0x200 => isImmdBitMask(%2 8 %3) => false
%2 = 0xD7 => isImmdBitMask(%2 8 %3) => false
%1 = 0x200 => isImmdBitMask(%1 16 %4) => true, %4 = 9
%1 = 0xEFFFFFFF => isImmdBitMask(%1 32 %2) => true, %2 = 28
The primary motivation behind this function is to streamline the unwieldy and repetitive sets of peephole optimiser rules for the STM8 platform that deal with manipulating or testing individual bits within registers or memory bytes. There may be future applications for other platforms too.
For example, the following rule (from stm8/peeph.def) is repeated 8 times with different bit mask values on the or
instruction and different bit position literal on the bset
:
replace restart { ld a, %1 or a, #0x80 ld %1, a } by { bset %2, #7 ; peephole 18-7 replaced or by bset. } if notUsed('a'), operandsLiteral(%1), immdInRange(0 65535 '+' 0 %1 %2)
Having this new constraint function could cut these 8 down to a single rule:
replace restart { ld a, %1 or a, #%2 ld %1, a } by { bset %1, #%3 ; peephole 18 replaced ld-or-ld by bset. } if notUsed('a' 'n' 'z'), operandsLiteral(%1), isImmdBitMask(%2 8 %3)
Also attached is a separate patch to stm8/peeph.def with modifications to use this new constraint function, which consolidates 32 rules down to only 4. A review of peephole rules for all other platforms did not reveal any other suitable candidates for modification.
By the way, I nearly forgot to mention, I was also going to create a patch to the manual with description of the new function, but I did not find a section with docs on the peephole constraint functions to be present.
I could have sworn someone (I do not remember who) on the dev mailing list mentioned a few months ago that they had added such information to the manual. Was it not merged?
There is a bit, but not much at the end of section 8.1.16.
I found the mailing list post - it was Sergey Belyashov who had updated the docs. But it is in a separate 'doc-peeph' branch, so that is why I did not find it in the main branch.
Thanks. Some remarks:
1) Would you mind making it work up to 64 bits (probably just using unsigned long long instead of unsigned long would do). It is not important now, but the restriction to 32 bits seems arbitrary.
2) The name is a bit non-descriptive. How about something like ImmdSingleBitSet / ImmdSingleBitUnset?
3) The description looks like a single set bit is treated the same as a single bit unset. Wouldn't Your example rule then incorrectly transform code that does |= 0xfe into code that does |= 0x01?
For (1), yes, possibly. I only made it 32 bits because, IIRC, your previous comments on the dev mailing list indicated it would be highly unlikely more than 32 bits would be needed. I suppose it's just the
val < (1 << width)
test that needs changed - off the top of my head probably(unsigned long long)val < (1ULL << width)
might do the trick.Regarding (3), you're right, that is a problem. I hadn't thought of that scenario. Perhaps instead I can split it into two functions: isImmdSingleBitSet and isImmdSingleBitClear. Probably both simply wrappers to a common implementation function.
isImmdSingleBitSet and isImmdSingleBitClear sounds good.
Is anything else holding this patch back?
Yes.
We had two competing patches for this feature submitted at the same time. I think I prefer the approach from patch [#362].
Related
Patches:
#362Last edit: Maarten Brock 2021-05-25
I've been attempting to make changes to handle 64 bits, but I've run into problems. I just don't seem to be able to convince the compiler to produce code that properly left-shifts an
unsigned long long
by 64 bits (for theval < (1 << width)
test, which checks if the immediate value is no greater than the given bit width).I've tried:
(unsigned long long)val < (1ULL << width)
(unsigned long long)val < ((unsigned long long)1 << width)
(unsigned long long)val < (1ULL << (unsigned long long)width)
It seems that left-shifting by anything >= 64 gets turned into
1 << (width % 32)
.Anyone got any ideas?
If I can't solve this problem, I'll just have to leave it limited to 32 bits.
I really don't get it, what is the point in checking for 64 bits if values gotten by immdGet are longs? and as such, are only guaranteed to be 32 bits?
I don't think that assertion holds. Values parsed by immdGet() can be expressed in hexadecimal and so could be
0xFFFFFFFFFFFFFFFF
. Even though that function returns a signedlong
, if you cast the value tounsigned long
, it will represent the original intended value, no?You can cast the returned value to whatever you want, but if the storage of an unsigned long holds only 32 bits, you'll never get anything bigger than that:
For example:
Shows the following output in my computer:
Assigning a value > 32bits is only stored in a variable of long long type, otherwise content is truncated.
Even worse, because immdGet uses a pointer to set the value, it only sets the lower part, the higher 32bits remain at whatever they value was before, in this example that's why we get 0x11111111FFFFFFFF instead of 0x00000000FFFFFFFF.
What a coincidence! I was just preparing the same thing. Because I just got tired of creating new rules using scripts for every bit.
But I implemented it as an extra operator for immdInRange to give it more versatility:
The given example would be
In my case the syntax is:
immdInRange(MIN MAX 'bp' VALUE BITS RESULT)
MIN, MAX: operate as usual.
'bp': is the new operator "bit position"
VALUE: is the value to be evaluated
BITS: number of bits for the mask (>0 for positive bits, < 0 for reversed bits).
returns false if VALUE is not a valid bit position (no matter what MIN and MAX are), if it's a valid bit position, then it checks MIN and MAX as usual.
It handles case 3 correctly, because the function checks normal or reversed bits as requested by BITS param.
The implementation looks like this:
So far, regression test pass successfully. And all my extra rules translate correcly into bset, bres, bcpl, btjt and btjf
Haha, interesting. :)
I did not go the route of doing something like adding to immdInRange, as I find that function fairly imperceptible/opaque already, and adding further complication wouldn't be a good thing in my mind. A separate function (or functions) is much clearer in my opinion.
What extra rules have you formulated? In particular, I have a whole bunch of rules for
btjt
andbtjf
(see feature request #690) for some) that I intend to modify to use this new function, which will be submitted once I've tested everything thoroughly.And is called like this from immdInRange:
I am doing my final testing for this and other patches I've prepared, I'll upload them soon.
In [r12407], I applied peeph_stm8_def.patch. I modified it to use the singleBitSet and SingleBitReset conditions from patch [#362].
There is however, a small regression: the new rule 20 apparently doesn't apply in some cases where the old 20-? rules applied. This is probably due to the additional notUsed conditions. Since those conditions look more correct than the ones on the old 20-? rules, I applied that part anyway. I guess notUsed gives some false negatives here, and inted to look into it later.
Related
Patches:
#362Hmm, interesting. I added the extra
notUsed
conditions because the original finalLD
would modify the N and Z flags, and I felt the previous rules were perhaps insufficiently strict enough not checking the state of the those flags weren't being later relied upon.I guess you're right that any such regression will be harmless and only result in rules not being applied when they otherwise could have been.
I think rule 21 is not correct, it is using "isImmdBitMask" instead of "immdInRange" (with "singleSetBit"):
Should be:
That may also mean that the rule is never matched, because otherwise you should have seen an error with the regression tests.
Last edit: Visenri 2021-05-28
Thanks. I fixed that now in [r12412]. Though indeed I don't get matches. For the following, only the |= gets optimized with current rules.
Anyway, when I find time, I'll look through your rules in this patch to add them;, and sometime before 4.2.0, probably already this summer, there should be a rule cleanup, where rules that don't apply in the regression tests get removed.
Last edit: Maarten Brock 2021-06-05
This code shows one of the (multiple) problems I fixed long ago with my patch: [#290]
Without that patch:
This is due to bad parsing of instructions like "ld a, 0x0024"
The 'x' in the address is interpreted as reading X, so the unneeded "clrw x" is not removed and prevents the optimization of the first bcpl as follows:
Another proof:
If you instead test this code (changed 'and' to a 'bres' like operation):
This is the result:
Now the first operation is optimized, because when the bres removes the " ld a, 0x0024", the bad parsing no longer happens and the "bcpl" optimization becomes possible.
Related
Patches: #290
Last edit: Maarten Brock 2021-06-05
This also happens with compiler symbols that contain 'x' or 'y':
This gets full optimization:
However, the following code results in missing optimizations (only the variable name has been changed):