I am using Build 9082 of SDCC on the STM8 platform
my Ringbuffer Function compiles and I found two point where Peephole Optimization would make sense
unsigned char rx_ring[256]; unsigned char rx_head; unsigned char rx_tail; unsigned char RXBuffer_ReadBytes() { unsigned char temp; temp = rx_ring[rx_tail]; rx_tail++; return temp; }
compiles to:
; rx_ringbuffer.c: 27: temp = rx_ring[rx_tail]; ldw x, #_rx_ring+0 a, xl add a, _rx_tail+0 ld xl, a ld a, xh adc a, #0x00 ld xh, a ld a, (x) ld xl, a ; rx_ringbuffer.c: 28: rx_tail++; inc _rx_tail+0 ; rx_ringbuffer.c: 30: return temp; ld a, xl ret
which could be optimized to something like that:
; rx_ringbuffer.c: 27: temp = rx_ring[rx_tail]; ldw x, #_rx_ring+0 a, xl add a, _rx_tail+0 rlwa adc a, #0x00 ld xh, a ld a, (x) ; rx_ringbuffer.c: 28: rx_tail++; inc _rx_tail+0 ; rx_ringbuffer.c: 30: return temp; ret
The following peephole rule does not work.
replace restart {
ld xl, a
ld a, xh
} by {
rlwa x, a
} if notUsed('xh')
I will try to find the reason, or if can be fixed from upstream code generator.
If the rule does not work, You might want to check if something is wrng with notUsed(). A problem in notUsed( might affect other rules as well.
However, there might be cases in the above peephole isproblematic, since rlwa affects the flags, while ld between registers does not.
Philipp
I suggest change
unsigned char rx_head;
unsigned char rx_tail;
to
unsigned int rx_head;
unsigned int rx_tail;
then more effiecient code is generated.
i am not quite sure if the
RLWA operation is expecting operands.
Am 2014-10-03 um 14:58 schrieb Ben Shi:
--
Related
Feature Requests:
#415rlwa has only one argument. But that would not stop the peephole optimizer from emitting an rlwa with two operands (later stages should emit an error though).
Philipp
Does the generated code improve if you post-increment rx_tail inside the array access?
temp = rx_ring[rx_tail++];
aopGet() should be able to deal with the post-increment and temp should become return-use-only and not go through xl.
That might not help. Since
are generated within on IR instruction: rx_ring(16-bit pointer) + rx_tail(8-bit offset), not by the post-increment of rx_tail.
Implemented in reversion 9113 by peephole, not by upstream code generator.