|
From: Julian S. <js...@ac...> - 2006-01-30 19:24:02
|
Recently I've been chasing a problem in a fragment of code
generated by IBM's optimising compiler, xlc, at -O3, in which
V produces a different result from natively. I'm pretty sure
it's due to V ignoring of the FPU rounding mode during basic
FP operations (+, -, etc).
Nevertheless I would like to understand how this code fragment
works, and I have no idea. If you compile this
unsigned int foo ( double d ) { return (unsigned int)d; }
with xlc (7.0.0) -O3, you get something equivalent to that
shown below (which is compilable with gcc). Any idea how it
works? There's clearly some magic to do with rounding and
adding strange constants, but I can't see what it is.
I've added what commentary I've deduced so far.
I've consulted the recommended code fragments for
double-to-unsigned-int conversion in the various ppc arch
manuals and compiler-writer's guide, and although they offer
various ultramagical code fragments, none of them look
like this. In particular I can't make sense of the
4.512396e+15 value -- it's not 2^52 or anything otherwise
obviously connected to the IEEE double format.
I also don't exactly understand "mtfsb1 4*cr7+so". This
sets the least significant bit of fpscr. The rounding mode
is the two least significant bits, so that means the
resulting mode is could be either 01 (round towards zero)
or 11 (round towards -Inf) depending on the initial value
of it. That seems a bit strange.
J
extern double foo ( double );
asm("\n"
".text\n"
".global foo\n"
".type foo, @function\n"
"foo:\n" // f1 = incoming arg
" stwu %r1,-48(%r1)\n"
" addis %r4,%r0,.const_dr@ha\n"
" addis %r0,%r0,17376\n"
" fabs %f0,%f1\n" // f0 = abs(arg)
" addi %r3,%r0,0\n"
" mffs %f3\n" // save old fpscr in f3
" mtfsb1 4*cr7+so\n" // set fpscr bit 0 (rounding mode)
// --> mode is now toZero or to -Inf
" lfs %f2,.const_dr@l(%r4)\n" // f2 = 4.512396e+15
" fcmpu 0,%f1,%f0\n" // cr0 = cmp(arg, abs(arg))
" fadd %f0,%f0,%f2\n" // f0 = round( abs(arg)+4.512396e+15 )
" mtfsf 255,%f3\n" // restore fpscr from f3
" stfd %f0,24(%r1)\n" // dump on stack
" bne- $+0x1c\n" // jump if ??
" lwz %r3,24(%r1)\n" // fetch high half of result from stack
" subf %r0,%r3,%r0\n" // i'm lost
" srawi %r0,%r0,31\n"
" ori %r0,%r0,0x0000\n"
" lwz %r4,28(%r1)\n"
" or %r3,%r4,%r0\n"
" addi %r1,%r1,48\n"
" blr\n"
".size foo, . - foo\n"
" .long 0\n"
" .long 0x00000000\n"
" .long 0x00000000\n"
"\n"
" .section \".rodata\",\"a\"\n"
" .align 3\n"
" .type .const_dr,@object\n"
" .size .const_dr,20\n"
".const_dr:\n"
" .long 0x59804000\n"
" .long 0x49424d20\n"
" .long 0x3fe66666\n"
" .long 0x66666666\n"
" .long 0x25640a00\n"
);
int main (int argc, char** argv)
{
printf("%d\n", foo(0.7));
return 0;
}
// prints 0 natively, 1 when run on V
|