|
From: Richard M. <ric...@cr...> - 2004-03-15 20:55:34
|
On Monday 15 March 2004 01:08, Jim Ursetto wrote: > At 01:28am on 2004 March 15, Dan Potter did write: > > I wonder too how much of these missed optimizations are due to SH4 vs= > > other arches. It seems like some of them have got to be the > > responsibility of a higher-up piece of code. > > For the most part, I'd guess it pervades all architectures (I saw the > note about the SSE wrappers too). Sometimes it does wacky things, > sometimes it exceeds your expectations. It's a bit disappointing > as it makes it quite a bit trickier to use classes in a very > time-critical loop. I've already seen this with C but it seems > even more true for C++. > > The ultimate solution for fast vector types might be to use Blitz++ or > tvmet (tvmet.sf.net), but those are highly complex packages. > > The fact that the return value optimization is not performed for > assignments, which is a property of C++ and not gcc-specific, is turnin= g > out to be important. The following code =5Byes my brain is fried now=5D= > shows the difference between =22a =3D b * scalar=22, =22a =3D b; a *=3D= scalar=22, > and =22Vector a =3D b * scalar=22. Just note the number of instruction= s. Here's a comparison between GCC-3.4 and a recent version of Microsoft's S= H = compiler. The MS compiler produces the same code for all three cases. (GCC 3.4 20031203) sh-elf-gcc -ml -m4-single-only -O3 -fomit-frame-pointer -ffast-math (Microsoft 12.20.9518 for Hitachi SH) clsh /D__STDC__ /Qsh4 /D__SH4__ /D_SHX_ /DSH4 /Ox /Ob2 /Qfastd > volatile float m; > Vector4 a, b; > a =3D b * dot; > m =3D a.x; GCC=09=09=09=09CLSH add=09=23-52,r15=09=09add =23-24,r15 mov=09r15,r2 add=09=2320,r2=09=09=09mov =238,r0 fmov.s =40r2,fr0=09=09=09fmov.s =40(r0,r15),fr0 add =2316,r2 mov r15,r1 add =234,r1 fmul fr4,fr0=09=09=09fmul fr4,fr0 fmov.s fr0,=40r2 mov.l =40(36,r15),r0 mov.l r0,=40(4,r15) fmov.s =40r1,fr1 fmov.s fr1,=40r15=09=09fmov.s fr0,=40r15 rts=09=09=09=09rts add =2352,r15=09=09=09add =2324,r15 > =A0 =A0 volatile float m; > =A0 =A0 Vector4 a, b; > =A0 =A0 a =3D b; > =A0 =A0 a *=3D dot; > =A0 =A0 m =3D a.x; GCC=09=09=09=09CLSH add =23-36,r15=09=09add =23-24,r15 mov.l =40(20,r15),r0 mov r15,r1 add =234,r1=09=09=09mov =238,r0 mov.l r0,=40(4,r15) fmov.s =40r1,fr1=09=09=09fmov.s =40(r0,r15),fr0 fmul fr4,fr1=09=09=09fmul fr4,fr0 fmov.s fr1,=40r15=09=09fmov.s fr0,=40r15 rts=09=09=09=09rts add =2336,r15=09=09=09add =2324,r15 > =A0 =A0 volatile float m; > =A0 =A0 Vector4 b; > =A0 =A0 Vector4 a =3D b * dot; > =A0 =A0 m =3D a.x; > = GCC=09=09=09=09CLSH add =23-36,r15=09=09add =23-24,r15 mov r15,r1=09= add =234,r1=09=09=09mov =238,r0 fmov.s =40r1,fr1=09=09=09fmov.s =40(r0,r15),fr0 fmul fr4,fr1=09=09=09fmul fr4,fr0 fmov.s fr1,=40r15=09=09fmov.s fr0,=40r15 rts=09=09=09=09rts add =2336,r15=09=09=09add =2324,r15 |