|
From: Jim U. <ji...@3e...> - 2004-03-15 09:08:41
|
At 01:28am on 2004 March 15, Dan Potter did write: > I wonder too how much of these missed optimizations are due to SH4 vs > other arches. It seems like some of them have got to be the > responsibility of a higher-up piece of code. For the most part, I'd guess it pervades all architectures (I saw the note about the SSE wrappers too). Sometimes it does wacky things, sometimes it exceeds your expectations. It's a bit disappointing as it makes it quite a bit trickier to use classes in a very time-critical loop. I've already seen this with C but it seems even more true for C++. The ultimate solution for fast vector types might be to use Blitz++ or tvmet (tvmet.sf.net), but those are highly complex packages. The fact that the return value optimization is not performed for assignments, which is a property of C++ and not gcc-specific, is turning out to be important. The following code [yes my brain is fried now] shows the difference between "a = b * scalar", "a = b; a *= scalar", and "Vector a = b * scalar". Just note the number of instructions. void compute(Vector4 &v, float dot) { // v unused // add #-52,r15 // operator* generates a temporary; // mov r15,r1 // but only the x value (1 fmul)! // add #20,r1 volatile float m; // fmov.s @r1,fr1 Vector4 a, b; // mov r15,r1 a = b * dot; // add #36,r1 m = a.x; // fmul fr4,fr1 // fmov.s fr1,@r1 // mov.l @(36,r15),r1 // mov.l r1,@(4,r15) // mov r15,r1 // add #4,r1 // fmov.s @r1,fr1 // fmov.s fr1,@r15 // rts // add #52,r15 // operator*= does not generate a temp, // add #-36,r15 // but still assigns to the stack // mov.l @(20,r15),r1 volatile float m; // mov.l r1,@(4,r15) Vector4 a, b; // mov r15,r1 a = b; // add #4,r1 a *= dot; // fmov.s @r1,fr1 m = a.x; // fmul fr4,fr1 // fmov.s fr1,@r15 // rts // add #36,r15 // add #-36,r15 // best code: no assignment to stack! // mov r15,r1 volatile float m; // add #4,r1 Vector4 b; // fmov.s @r1,fr1 Vector4 a = b * dot; // fmul fr4,fr1 m = a.x; // fmov.s fr1,@r15 // rts // add #36,r15 } -- "Note that Unruh [9] proposed also an acoustical analog of a black hole, a dumb hole." -- http://www.st-and.ac.uk/~www_pa/group/quantumoptics/media.html ji...@3e... / 0x43340710 / 517B C658 D2CB 260D 3E1F 5ED1 6DB3 FBB9 4334 0710 |