|
From: Ian P. <ian...@in...> - 2004-04-08 00:27:01
|
On 08 Apr 2004, at 00:54, Ned Konz wrote:
> On a related note, does it seem wasteful to anyone but me that we do
> the
> following in primBytecodeAdd:
Probably not. ;)
You introduce an additional branch into the critical path, by checking
both operands for overflow instead of checking just the result.
> Seems like we could save the shifts in most cases by looking at the
> top two
> bits of the receiver and argument; if the sign bits are different or
> the high
> bits (B30) are both the same as the sign bits we aren't going to get
> any
> overflow.
You end up with exactly the same number of instructions anyway.
Current version:
lwz r3,0xfffc(r27)
lwz r4,0(r27)
and r28,r3,r4
andi. r9,r28,0x1
beq <fail>
srawi r5,r3,1
srawi r0,r4,1
add r4,r5,r0
rlwinm r2,r4,1,0,30
xor. r9,r4,r2
blt <fail>
ori r6,r2,0x1
stwu r6,0xfffc(r27)
<dispatch>
Nedified version:
lwz r3,0xfffc(r27)
lwz r4,0(r27)
xor. r0,r3,r4
blt <fail>
rlwinm r5,r3,1,0,30
xor. r2,r5,r3
blt <fail>
rlwinm r6,r4,1,0,30
xor. r2,r6,r4
blt <fail>
add r7,r3,r4
addi r3,r7,0xffff
stwu r3,0xfffc(r27)
<dispatch>
While it probably won't impact speed on a decent implementation of the
CPU (the additional branch will be predicted correctly) it won't
increase speed either (you haven't reduced the overall number of data
hazards the pipeline has to deal with).
Cheers,
Ian
|