|
From: Jim U. <ji...@3e...> - 2004-03-15 07:10:57
|
At 10:04pm on 2004 March 14, Jim Ursetto did write:
> Unfortunately, even with the temporary optimized out, gcc still writes
> the normalized vector to vc (on the stack) every single iteration, even
> though only vc.x is used, and only once.
One more thing. If you create a non-member inline function,
normalize(Vector& dest, Vector& src), it'll generate the exact same
(nearly optimal) code as Vector4 dest = src.normalize(), without putting
the initializer for dest in the loop. That includes writing the result
to the stack, which is odd because no constructor is called at all.
I wonder if this is a missed optimization.
As a side note, if you use v.normalize_self() in a loop, it
-will- store the result in memory on every single iteration, -and- read
it back at the top of the loop, just as the other functions do. I
thought gcc would peer into the inlined function and notice the values
need only be written out at the end of the loop, as they are not
volatile.
This is partly due to the call to printf(); gcc assumes the floating
point registers will be clobbered and reloads them. But even without
the call, it'll write/read the registers with no intervening operations.
In fact even the simple loop: translates to:
... precalculate v.normalize() ...
volatile float a; .L15:
for (int i = 0; i < 10; i++) { fmov.s fr4,@r2 <-- y [useless]
Vector4 vc = v.normalize(); dt r1
a = vc.x; fmov.s fr3,@r7 <-- z [useless]
} fmov.s fr2,@r3 <-- w [useless]
fmov.s fr5,@r15 <-- a = x
bf .L15
That's three totally useless instructions.
I apologize if this stuff is obvious, esoteric, and/or offtopic. :)
--
'The Grammy-winning singer, whose full name is Robert Kelly, is known for
inspirational hits such as "I Believe I Can Fly," but also raunchy ones
like "Feelin' On Yo Booty."' -- CNN.com article
ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710
|