|
From: <ajc...@en...> - 2005-05-25 00:07:52
|
I'm almost certain that there is some problem in your testing procedure. I've run a similar test (vector normalization) for floats and doubles, and will always get about a 20% improvement with floats. If you look at the manuals for the processors, you'll see longer latency for the double operations. I'd wager one of the following things might be messing up the tests... - unecessary conversions: when you specify a float literal, always add 'f' (eg. 1.0f instead of 1.0). Otherwise the compiler will synthesize conversions. Conversions can consume a lot of clock cycles. - missing dependencies: compiler might be optimizing some code or taking it out of the loop. If you compute a result then never use it, the code might disappear. If you compute the same result many times, it might only execute once. The only way I think you'll be able to make sure this isn't happening is to check the assembly code manually. It looks like your code is subject to some of these problems. I've made some attempts at fixing it but it is difficult to verify (looking through lots of assembly). Especially problematic might be the production of a result value that is never used again, so you aren't really timing the latency of the operation at all. Maybe a better test would be to do something useful with each type of arithmetic (intersect a ray with a sphere?). Andrew Quoting Wolfgang Wieser <ww...@gm...>: > Hello everybody, > > In order to get an estimate how long basic numerical caculations take > on current hardware, I ran the speed test > (src/lib/numerics/3d/test-speed.cc) > on different platforms. > > The results are below. This is just plain C++ code compiled; there are > no hand-coded SSE vectorisations or similar. > > The most remarkable fact is that using floats instead of doubles may have > the opposite effect than expected... Especially note the value in brackets > at the end of the lines: It is the ratio of double and float calc times, > i.e. values larger than 1 mean "double takes longer" while values smaller > than 1 mean "double is faster than float". > > Wolfgang > > BTW, in the last email I forgot to mention that longjmp'ing between thread > contexts is undefined in POSIX. So, we need to be prepared for the case that > > it cannot be done. > > --<AthlonXP 1.47GHz Linux-2.6.11 gcc-4.1.0, > NPTL>----------------------------- > (nothing) : 0.533687 nsec/cyc > vector3::length : flt: 28.0004 dbl: 26.718 nsec/cyc (0.95) > vector3*vector3 : flt: 10.2434 dbl: 22.804 nsec/cyc (2.23) > vector3 x vector3 : flt: 15.1952 dbl: 34.8585 nsec/cyc (2.29) > matrix3*vector3 : flt: 26.2452 dbl: 40.8993 nsec/cyc (1.56) > trafomatrix*vector3 : flt: 25.6166 dbl: 25.7406 nsec/cyc (1.00) > trafomatrix::inverse : flt: 126.307 dbl: 96.1046 nsec/cyc (0.76) > trafomatrix*trafomatrix : flt: 99.8809 dbl: 113.816 nsec/cyc (1.14) > matrix3*matrix3 : flt: 75.9704 dbl: 131.277 nsec/cyc (1.73) > > --<P4 2.80 GHz Linux-2.6.11 gcc-3.4.4, > LinuxThreads>-------------------------- > (nothing) : 0.118761 nsec/cyc > vector3::length : flt: 15.69 dbl: 15.5786 nsec/cyc (0.99) > vector3*vector3 : flt: 17.0779 dbl: 5.04801 nsec/cyc (0.30) > vector3 x vector3 : flt: 16.3483 dbl: 11.8949 nsec/cyc (0.73) > matrix3*vector3 : flt: 18.4656 dbl: 17.3939 nsec/cyc (0.94) > trafomatrix*vector3 : flt: 18.4069 dbl: 17.374 nsec/cyc (0.94) > trafomatrix::inverse : flt: 120.072 dbl: 84.1365 nsec/cyc (0.70) > trafomatrix*trafomatrix : flt: 79.1698 dbl: 86.2536 nsec/cyc (1.09) > matrix3*matrix3 : flt: 61.501 dbl: 68.3568 nsec/cyc (1.11) > > --<AMD64 1.8GHz FreeBSD-5.4 > gcc-3.4.2>---------------------------------------- > (nothing) : 0.287412 nsec/cyc > vector3::length : flt: 16.2991 dbl: 16.4841 nsec/cyc (1.01) > vector3*vector3 : flt: 10.1551 dbl: 5.49865 nsec/cyc (0.54) > vector3 x vector3 : flt: 16.7179 dbl: 9.67714 nsec/cyc (0.58) > matrix3*vector3 : flt: 27.3062 dbl: 19.6488 nsec/cyc (0.72) > trafomatrix*vector3 : flt: 27.463 dbl: 19.1422 nsec/cyc (0.70) > trafomatrix::inverse : flt: 70.7865 dbl: 64.1694 nsec/cyc (0.91) > trafomatrix*trafomatrix : flt: 83.5633 dbl: 68.334 nsec/cyc (0.82) > matrix3*matrix3 : flt: 67.8196 dbl: 59.2467 nsec/cyc (0.87) > > --<AthlonXP 1.47GHz Linux-2.6.11 gcc-3.4.2 for Win32, NPTL, WINE > EMULATION!>-- > (nothing) : 0.356384 nsec/cyc > vector3::length : flt: 29.7528 dbl: 27.0064 nsec/cyc (0.91) > vector3*vector3 : flt: 15.3446 dbl: 7.66602 nsec/cyc (0.50) > vector3 x vector3 : flt: 21.3162 dbl: 13.4919 nsec/cyc (0.63) > matrix3*vector3 : flt: 26.0698 dbl: 26.6165 nsec/cyc (1.02) > trafomatrix*vector3 : flt: 26.3388 dbl: 25.8365 nsec/cyc (0.98) > trafomatrix::inverse : flt: 126.99 dbl: 114.622 nsec/cyc (0.90) > trafomatrix*trafomatrix : flt: 128.008 dbl: 134.121 nsec/cyc (1.05) > matrix3*matrix3 : flt: 102.492 dbl: 105.253 nsec/cyc (1.03) > > > ------------------------------------------------------- > This SF.Net email is sponsored by Oracle Space Sweepstakes > Want to be the first software developer in space? > Enter now for the Oracle Space Sweepstakes! > http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click > _______________________________________________ > Ray-devel mailing list > Ray...@li... > https://lists.sourceforge.net/lists/listinfo/ray-devel > ---------------------------------------- This mail sent through www.mywaterloo.ca |