|
From: Wolfgang W. <ww...@gm...> - 2005-05-22 20:54:11
|
Hello everybody, In order to get an estimate how long basic numerical caculations take on current hardware, I ran the speed test (src/lib/numerics/3d/test-speed.cc) on different platforms. The results are below. This is just plain C++ code compiled; there are no hand-coded SSE vectorisations or similar. The most remarkable fact is that using floats instead of doubles may have the opposite effect than expected... Especially note the value in brackets at the end of the lines: It is the ratio of double and float calc times, i.e. values larger than 1 mean "double takes longer" while values smaller than 1 mean "double is faster than float". Wolfgang BTW, in the last email I forgot to mention that longjmp'ing between thread contexts is undefined in POSIX. So, we need to be prepared for the case that it cannot be done. --<AthlonXP 1.47GHz Linux-2.6.11 gcc-4.1.0, NPTL>----------------------------- (nothing) : 0.533687 nsec/cyc vector3::length : flt: 28.0004 dbl: 26.718 nsec/cyc (0.95) vector3*vector3 : flt: 10.2434 dbl: 22.804 nsec/cyc (2.23) vector3 x vector3 : flt: 15.1952 dbl: 34.8585 nsec/cyc (2.29) matrix3*vector3 : flt: 26.2452 dbl: 40.8993 nsec/cyc (1.56) trafomatrix*vector3 : flt: 25.6166 dbl: 25.7406 nsec/cyc (1.00) trafomatrix::inverse : flt: 126.307 dbl: 96.1046 nsec/cyc (0.76) trafomatrix*trafomatrix : flt: 99.8809 dbl: 113.816 nsec/cyc (1.14) matrix3*matrix3 : flt: 75.9704 dbl: 131.277 nsec/cyc (1.73) --<P4 2.80 GHz Linux-2.6.11 gcc-3.4.4, LinuxThreads>-------------------------- (nothing) : 0.118761 nsec/cyc vector3::length : flt: 15.69 dbl: 15.5786 nsec/cyc (0.99) vector3*vector3 : flt: 17.0779 dbl: 5.04801 nsec/cyc (0.30) vector3 x vector3 : flt: 16.3483 dbl: 11.8949 nsec/cyc (0.73) matrix3*vector3 : flt: 18.4656 dbl: 17.3939 nsec/cyc (0.94) trafomatrix*vector3 : flt: 18.4069 dbl: 17.374 nsec/cyc (0.94) trafomatrix::inverse : flt: 120.072 dbl: 84.1365 nsec/cyc (0.70) trafomatrix*trafomatrix : flt: 79.1698 dbl: 86.2536 nsec/cyc (1.09) matrix3*matrix3 : flt: 61.501 dbl: 68.3568 nsec/cyc (1.11) --<AMD64 1.8GHz FreeBSD-5.4 gcc-3.4.2>---------------------------------------- (nothing) : 0.287412 nsec/cyc vector3::length : flt: 16.2991 dbl: 16.4841 nsec/cyc (1.01) vector3*vector3 : flt: 10.1551 dbl: 5.49865 nsec/cyc (0.54) vector3 x vector3 : flt: 16.7179 dbl: 9.67714 nsec/cyc (0.58) matrix3*vector3 : flt: 27.3062 dbl: 19.6488 nsec/cyc (0.72) trafomatrix*vector3 : flt: 27.463 dbl: 19.1422 nsec/cyc (0.70) trafomatrix::inverse : flt: 70.7865 dbl: 64.1694 nsec/cyc (0.91) trafomatrix*trafomatrix : flt: 83.5633 dbl: 68.334 nsec/cyc (0.82) matrix3*matrix3 : flt: 67.8196 dbl: 59.2467 nsec/cyc (0.87) --<AthlonXP 1.47GHz Linux-2.6.11 gcc-3.4.2 for Win32, NPTL, WINE EMULATION!>-- (nothing) : 0.356384 nsec/cyc vector3::length : flt: 29.7528 dbl: 27.0064 nsec/cyc (0.91) vector3*vector3 : flt: 15.3446 dbl: 7.66602 nsec/cyc (0.50) vector3 x vector3 : flt: 21.3162 dbl: 13.4919 nsec/cyc (0.63) matrix3*vector3 : flt: 26.0698 dbl: 26.6165 nsec/cyc (1.02) trafomatrix*vector3 : flt: 26.3388 dbl: 25.8365 nsec/cyc (0.98) trafomatrix::inverse : flt: 126.99 dbl: 114.622 nsec/cyc (0.90) trafomatrix*trafomatrix : flt: 128.008 dbl: 134.121 nsec/cyc (1.05) matrix3*matrix3 : flt: 102.492 dbl: 105.253 nsec/cyc (1.03) |