From: Alex P. <res...@gt...> - 2001-03-16 09:40:14
|
----- Original Message ----- From: "Andreas H=F6fler" <hoe...@ao...> To: "Crystal Space" <cry...@li...> Sent: Tuesday, March 13, 2001 2:21 PM Subject: [CsMain] code-optimisations in CS > I'd like to know if and when, how intensive such code-optimisations are made in CS. > > For example: > > int i; > for (i =3D 0; i < 255; i++) {...}; > > is (very) slightly faster than > > for (int i =3D 0; i < 255; i++) {...}; > > > There are some other easy optimisations but I'm not sure if it makes se= nse > to apply them because CS is always under development. One optimisation = for > itself makes hardly a difference, but if more of them come together... > > I came to this thoughts when I stumbled over a bit of code in wirefrm.c= pp in > csengine: > > for (i =3D 0 ; i < 16 ; i++) > col_idx[i] =3D txtmgr->FindRGB (r*(20-i)/20, g*(20-i)/20, b*(20-i)/= 20); > > I saw it and thought, that it would be a tiny speed-improvement if this loop > were unrolled as: > > col_idx[0] =3D txtmgr->FindRGB (r*(20-0)/20, g*(20-0)/20, b*(20-0)/= 20); > col_idx[1] =3D txtmgr->FindRGB (r*(20-1)/20, g*(20-1)/20, b*(20-1)/= 20); > col_idx[2] =3D txtmgr->FindRGB (r*(20-2)/20, g*(20-2)/20, b*(20-2)/= 20); > ... > > (Does anyone want to know, WHY this is faster?) I hope you are not worried about (20-0)/20 because that all involves constants which are precomputed by the compiler. In general all this stuff is irrelevant unless it is called millions of time. The best way to improve performance is to measure your applications and find the actual bottle necks. Then you have several options. [1] Change you code to avoid working on data that it doesn't have to. - Eg better culling algorithm. Faster search etc. [2] Change the structure of your code to take less steps to perform the work, fewer function calls etc. [3] Change a function with the types of operations you are hinting at. In general the biggest problem today is not CPU speed but the amount of memory that you need to access due to L1/L2/MainMemory Cache latency. Register access is of course fastest, L1 is much slower, L2 is 2x slower than L1 and main memory is 2x slower than L2. > > On the other hand such easy optimisations can be performed by some compilers > theirselves but I don't know, if this applies on every platform and wit= h > every compiler. > > If the code gets optimized, there should be also a comment, how the cod= e looked > before. > > I'm not talking about really advanced optimisations which are hardly readable > by anyone except the coder himself, just little mods, everyone can do (= if he > has the knowledge of them). > > > Andreas H=F6fler |