Re: [KOS] optimization oddity

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Monday 15 March 2004 01:08, Jim Ursetto wrote:
> At 01:28am on 2004 March 15, Dan Potter did write:
> > I wonder too how much of these missed optimizations are due to SH4 vs=

> > other arches. It seems like some of them have got to be the
> > responsibility of a higher-up piece of code.
>
> For the most part, I'd guess it pervades all architectures (I saw the
> note about the SSE wrappers too).  Sometimes it does wacky things,
> sometimes it exceeds your expectations.  It's a bit disappointing
> as it makes it quite a bit trickier to use classes in a very
> time-critical loop.  I've already seen this with C but it seems
> even more true for C++.
>
> The ultimate solution for fast vector types might be to use Blitz++ or
> tvmet (tvmet.sf.net), but those are highly complex packages.
>
> The fact that the return value optimization is not performed for
> assignments, which is a property of C++ and not gcc-specific, is turnin=
g
> out to be important.  The following code =5Byes my brain is fried now=5D=

> shows the difference between =22a =3D b * scalar=22, =22a =3D b; a *=3D=
 scalar=22,
> and =22Vector a =3D b * scalar=22.  Just note the number of instruction=
s.

Here's a comparison between GCC-3.4 and a recent version of Microsoft's S=
H =

compiler.  The MS compiler produces the same code for all three cases.

(GCC 3.4 20031203)
sh-elf-gcc -ml -m4-single-only -O3 -fomit-frame-pointer -ffast-math

(Microsoft 12.20.9518 for Hitachi SH)
clsh /D__STDC__ /Qsh4 /D__SH4__ /D_SHX_ /DSH4 /Ox /Ob2 /Qfastd

>     volatile float m;
>     Vector4 a, b;
>     a =3D b * dot;
>     m =3D a.x;

GCC=09=09=09=09CLSH

add=09=23-52,r15=09=09add     =23-24,r15
mov=09r15,r2
add=09=2320,r2=09=09=09mov     =238,r0
fmov.s  =40r2,fr0=09=09=09fmov.s  =40(r0,r15),fr0
add     =2316,r2
mov     r15,r1
add     =234,r1
fmul    fr4,fr0=09=09=09fmul    fr4,fr0
fmov.s  fr0,=40r2
mov.l   =40(36,r15),r0
mov.l   r0,=40(4,r15)
fmov.s  =40r1,fr1
fmov.s  fr1,=40r15=09=09fmov.s  fr0,=40r15
rts=09=09=09=09rts
add     =2352,r15=09=09=09add     =2324,r15

> =A0 =A0 volatile float m;
> =A0 =A0 Vector4 a, b;
> =A0 =A0 a =3D b;
> =A0 =A0 a *=3D dot;
> =A0 =A0 m =3D a.x;

GCC=09=09=09=09CLSH

add     =23-36,r15=09=09add     =23-24,r15
mov.l   =40(20,r15),r0
mov     r15,r1
add     =234,r1=09=09=09mov     =238,r0
mov.l   r0,=40(4,r15)
fmov.s  =40r1,fr1=09=09=09fmov.s  =40(r0,r15),fr0
fmul    fr4,fr1=09=09=09fmul    fr4,fr0
fmov.s  fr1,=40r15=09=09fmov.s  fr0,=40r15
rts=09=09=09=09rts
add     =2336,r15=09=09=09add     =2324,r15

> =A0 =A0 volatile float m;
> =A0 =A0 Vector4 b;
> =A0 =A0 Vector4 a =3D b * dot;
> =A0 =A0 m =3D a.x;
> =

GCC=09=09=09=09CLSH

add     =23-36,r15=09=09add     =23-24,r15
mov     r15,r1=09=

add     =234,r1=09=09=09mov     =238,r0
fmov.s  =40r1,fr1=09=09=09fmov.s  =40(r0,r15),fr0
fmul    fr4,fr1=09=09=09fmul    fr4,fr0
fmov.s  fr1,=40r15=09=09fmov.s  fr0,=40r15
rts=09=09=09=09rts
add     =2336,r15=09=09=09add     =2324,r15