The correct wayx to implement this would be to transform
y = x / 2^k;
into
t = x;
if(t < 0)
t += 2 ^ k - 1;
y = t >> k;
The simpler
y = x >> k;
is wrong, since it rounds towards minus infinity instead of towards zero.
Philipp
Please optimize divisons by powers of two.
The following two simple functions now generate calls
to the internal division functions, while they could be
easily implemented with bit shifts:
C code:
int divtesti(int a)
{
return(a / 8);
}
char divtestc(char a)
{
return(a / 8);
}
Resulting Z80 asm:
;test.c:8: int divtesti(int a)
; genLabel
; genFunction
; ---------------------------------
; Function divtesti
; ---------------------------------
_divtesti_start::
_divtesti:
push ix
ld ix,#0
add ix,sp
;test.c:10: return(a / 8);
; genIpush
; _saveRegsForCall: sendSetSize: 0 deInUse: 0 bcInUse:
0 deSending: 0
ld hl,#0x0008
push hl
; genIpush
; AOP_STK for
ld l,4(ix)
ld h,5(ix)
push hl
; genCall
call __divsint_rrx_s
ld b,h
ld c,l
pop af
pop af
; genRet
; Dump of IC_LEFT: type AOP_REG size 2
; reg = bc
ld l,c
ld h,b
; genLabel
00101$:
; genEndFunction
pop ix
ret
_divtesti_end::
_BILD:
.dw #0x1800
;test.c:13: char divtestc(char a)
; genLabel
; genFunction
; ---------------------------------
; Function divtestc
; ---------------------------------
_divtestc_start::
_divtestc:
push ix
ld ix,#0
add ix,sp
;test.c:15: return(a / 8);
; genIpush
; _saveRegsForCall: sendSetSize: 0 deInUse: 0 bcInUse:
0 deSending: 0
ld a,#0x08
push af
inc sp
; genIpush
; AOP_STK for
ld a,4(ix)
push af
inc sp
; genCall
call __divschar_rrx_s
ld c,l
pop af
; genRet
; Dump of IC_LEFT: type AOP_REG size 1
; reg = c
ld l,c
; genLabel
00101$:
; genEndFunction
pop ix
ret
_divtestc_end::
The correct wayx to implement this would be to transform
y = x / 2^k;
into
t = x;
if(t < 0)
t += 2 ^ k - 1;
y = t >> k;
The simpler
y = x >> k;
is wrong, since it rounds towards minus infinity instead of towards zero.
Philipp
IMO the solution presented here is equivalent to the one from #3072154, but written down in a more elegant way. The proposal from #2997777 is just wrong. Thus I'll close the other two items.
Philipp
The result of a right shift of a signed negative quantity is undefined in the C spec, as Maaten already mentioned in #2997777. The implementation differs even for different targets in sdcc. In the proposed solution is probably assumed that >> is implemented as arithmetic shift right (please correct me if I'm wrong), so the proper notation would be:
t = x;
if(t < 0)
t += 2 ^ k - 1;
y = t ASR_OP k;
See my proposal for ASR_OP introduction in #2997777 and all conversation that follows.
Borut
>The result of a right shift of a signed negative quantity is undefined in the C spec
I know.
> The implementation differs even for different targets in sdcc
I thought that all targets do an arithmetic right shift. Which ones do not? A quick look at the z80 and pic16 shows they do arithmetic right shift. The pic14 port seems to do it that way as well, but it has the confusing comment that was in the z80 code, too. Would there be a cost associated with making all ports always do an arithmetic right shift for signed types?
Philipp
P.S.: There were contradicting comments in the Z80 backend code regarding this issue. I just removed the incorrect one.
I don't think there can be ports that do not use arithmetic right shift for signed types: Regression test2ShiftRight() in the shifts.c regression tests explicitly checks that right shift is arithmetic.
Philipp
My (now shown as false) assumption about different implementations of >> is based on comments in #2997777 and different regtest results for different target when the patch was applied. I didn't systematically look in the code.
But even if it is true that all targets do an arithmetic right shift it is dangerous to relay the division with 2^n on it: rhe implementation of >> may change in the future and the division with 2^n will suddenly fail.
OTOH the regression tests will show this immediately....
If we want to make right shift logic by default for some target in the future we can always introduce a new shift operator which is undefined for negative qunatities then. Or, alternatively, introduce an ARIGHT_OP and have the old RIGHT_OP undefined for signed stuff (and then make division use ARIGHT_OP).
IMO we can use the existing RIGHT_OP for now, and reconsider making separate operations when the need arises.
Philipp
I can live with this: actually it is what I proposed in #2997777 although with wrong assumption that it can be implemented as y = x >> k;
So if I understand you propose to implement it on icode level, so it will automatically work for all targets?
Borut
> So if I understand you propose to implement it on icode level, so it will automatically work for all targets?
Yes.
Philipp
What I wrote in #2997777 seems not completely correct. The C99 spec says that the result is "implementation defined" which is different from "undefined" (I guess). So SDCC defines the result of >> on a negative signed value to be an arithmetic shift right and thus keeps the sign. As Philipp noted test2ShiftRight() in the shifts.c regression tests explicitly checks that right shift is arithmetic for signed types.
Now this rounding. Philipp claims that y = x >> k is wrong, since it rounds towards minus infinity instead of towards zero. But this I cannot find in the C99 spec. As I said the result is implementation defined and thus we can choose to round towards zero. The current implementation however rounds towards minus infinity and so do the hosts we run the tests on apparently. I don't think we should want to change this.
So, I agree with the proposal but think we should update shifts.c with comments regarding our choice of implementation.
> As I said the result is implementation defined and thus we can choose to round towards zero.
Yes. We could essentially implement x >> k as x / 2^k. Or implement x >> k as
t = x;
if(t < 0)
t += 2 ^ k - 1;
y = t RIGHT_OP k;
In my statement I used the npotation ">>" as shorthand for RIGHT_OP (as currently implemented by sdcc). Nothe that the other obvious choice for implementing >>, logical right shift, wouldn't make x >> k a correct substitute for x / 2^k either.
>So, I agree with the proposal but think we should update shifts.c with comments regarding our choice of implementation.
Done in revision #6721.
Philipp
I implemented this optimization a long time ago. I just verified that the optimization works for z80 with current SDCC. It is not done for all ports, since some architectures have efficient division instructions.
Philipp