Re: [CsMain] code-optimisations in CS

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

----- Original Message -----
From: "Andreas H=F6fler" <hoe...@ao...>
To: "Crystal Space" <cry...@li...>
Sent: Tuesday, March 13, 2001 2:21 PM
Subject: [CsMain] code-optimisations in CS

> I'd like to know if and when, how intensive such code-optimisations are
made in CS.
>
> For example:
>
> int i;
> for (i =3D 0; i < 255; i++) {...};
>
> is (very) slightly faster than
>
> for (int i =3D 0; i < 255; i++) {...};
>
>
> There are some other easy optimisations but I'm not sure if it makes se=
nse
> to apply them because CS is always under development. One optimisation =
for
> itself makes hardly a difference, but if more of them come together...
>
> I came to this thoughts when I stumbled over a bit of code in wirefrm.c=
pp
in
> csengine:
>
>   for (i =3D 0 ; i < 16 ; i++)
>     col_idx[i] =3D txtmgr->FindRGB (r*(20-i)/20, g*(20-i)/20, b*(20-i)/=
20);
>
> I saw it and thought, that it would be a tiny speed-improvement if this
loop
> were unrolled as:
>
>     col_idx[0] =3D txtmgr->FindRGB (r*(20-0)/20, g*(20-0)/20, b*(20-0)/=
20);
>     col_idx[1] =3D txtmgr->FindRGB (r*(20-1)/20, g*(20-1)/20, b*(20-1)/=
20);
>     col_idx[2] =3D txtmgr->FindRGB (r*(20-2)/20, g*(20-2)/20, b*(20-2)/=
20);
>     ...
>
> (Does anyone want to know, WHY this is faster?)
I hope you are not worried about (20-0)/20 because that all involves
constants
which are precomputed by the compiler.  In general all this stuff is
irrelevant unless
it is called millions of time.  The best way to improve performance is to
measure
your applications and find the actual bottle necks.
Then you have several options.
[1] Change you code to avoid working on data that it doesn't have to.
    - Eg better culling algorithm.  Faster search etc.
[2] Change the structure of your code to take less steps to perform the
work,
      fewer function calls etc.
[3] Change a function with the types of operations you are hinting at.

In general the biggest problem today is not CPU speed but the amount of
memory
that you need to access due to L1/L2/MainMemory Cache latency.
Register access is of course fastest, L1 is much slower, L2 is 2x slower
than L1 and main
memory is 2x slower than L2.

>
> On the other hand such easy optimisations can be performed by some
compilers
> theirselves but I don't know, if this applies on every platform and wit=
h
> every compiler.
>
> If the code gets optimized, there should be also a comment, how the cod=
e
looked
> before.
>
> I'm not talking about really advanced optimisations which are hardly
readable
> by anyone except the coder himself, just little mods, everyone can do (=
if
he
> has the knowledge of them).
>
>
> Andreas H=F6fler