|
From: Jeremy F. <je...@go...> - 2002-12-11 06:00:53
|
I realized we can take advantage of P only being generated on the lower
8 bits. For jle, we can generate something like:
movl EFLAGS(%ebp), %eax
andl $0x08C0, %eax
rorl $7, %eax
js 1f
jp 2f
1: movl $target, %eax
/* jump to target */
2: /* carry on */
and for jnle:
movl EFLAGS(%ebp), %eax
andl $0x08C0, %eax
rorl $7, %eax
js 1f
jnp 1f
/* jump to target */
1: /* carry on */
The insight here is that P only tests the lower 8 bits, so we can
independently test Z and O=S. Initially, eflags looks like:
----O--+SZ------
after the rorl it looks like:
Z------+-------+-------+---O---S
We can then use P to test O=S/O!=S and S to test the state of Z.
I'm pretty sure that these two jumps are cheaper than more arithmetic
ops, because they can take advantage of prediction rather than using up
ALU resources. Unfortunately the ROR is even more expensive than the
shift on the P4; with any luck prediction of the dependent jumps will
absorb the latency. I don't know if this is really much of an
improvement; I mainly did it for hack value.
The code generation for these jumps is much easier with 72-jump, which
automates the offset computation.
Moz 1.2.1 works fine with the improved (fixed) 69-simple-jlo and
75-simple-jle.
J
|