[Ray-devel] FP benchmarking, float versus double

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello everybody,

In order to get an estimate how long basic numerical caculations take 
on current hardware, I ran the speed test (src/lib/numerics/3d/test-speed.cc)
on different platforms. 

The results are below. This is just plain C++ code compiled; there are 
no hand-coded SSE vectorisations or similar. 

The most remarkable fact is that using floats instead of doubles may have 
the opposite effect than expected... Especially note the value in brackets 
at the end of the lines: It is the ratio of double and float calc times, 
i.e. values larger than 1 mean "double takes longer" while values smaller 
than 1 mean "double is faster than float". 

Wolfgang

BTW, in the last email I forgot to mention that longjmp'ing between thread 
contexts is undefined in POSIX. So, we need to be prepared for the case that 
it cannot be done. 

--<AthlonXP 1.47GHz Linux-2.6.11 gcc-4.1.0, NPTL>-----------------------------
  (nothing)                : 0.533687 nsec/cyc
  vector3::length          : flt:  28.0004   dbl:   26.718  nsec/cyc  (0.95)
  vector3*vector3          : flt:  10.2434   dbl:   22.804  nsec/cyc  (2.23)
  vector3 x vector3        : flt:  15.1952   dbl:  34.8585  nsec/cyc  (2.29)
  matrix3*vector3          : flt:  26.2452   dbl:  40.8993  nsec/cyc  (1.56)
  trafomatrix*vector3      : flt:  25.6166   dbl:  25.7406  nsec/cyc  (1.00)
  trafomatrix::inverse     : flt:  126.307   dbl:  96.1046  nsec/cyc  (0.76)
  trafomatrix*trafomatrix  : flt:  99.8809   dbl:  113.816  nsec/cyc  (1.14)
  matrix3*matrix3          : flt:  75.9704   dbl:  131.277  nsec/cyc  (1.73)

--<P4 2.80 GHz Linux-2.6.11 gcc-3.4.4, LinuxThreads>--------------------------
  (nothing)                : 0.118761 nsec/cyc
  vector3::length          : flt:    15.69   dbl:  15.5786  nsec/cyc  (0.99)
  vector3*vector3          : flt:  17.0779   dbl:  5.04801  nsec/cyc  (0.30)
  vector3 x vector3        : flt:  16.3483   dbl:  11.8949  nsec/cyc  (0.73)
  matrix3*vector3          : flt:  18.4656   dbl:  17.3939  nsec/cyc  (0.94)
  trafomatrix*vector3      : flt:  18.4069   dbl:   17.374  nsec/cyc  (0.94)
  trafomatrix::inverse     : flt:  120.072   dbl:  84.1365  nsec/cyc  (0.70)
  trafomatrix*trafomatrix  : flt:  79.1698   dbl:  86.2536  nsec/cyc  (1.09)
  matrix3*matrix3          : flt:   61.501   dbl:  68.3568  nsec/cyc  (1.11)

--<AMD64 1.8GHz FreeBSD-5.4 gcc-3.4.2>----------------------------------------
  (nothing)                : 0.287412 nsec/cyc
  vector3::length          : flt:  16.2991   dbl:  16.4841  nsec/cyc  (1.01)
  vector3*vector3          : flt:  10.1551   dbl:  5.49865  nsec/cyc  (0.54)
  vector3 x vector3        : flt:  16.7179   dbl:  9.67714  nsec/cyc  (0.58)
  matrix3*vector3          : flt:  27.3062   dbl:  19.6488  nsec/cyc  (0.72)
  trafomatrix*vector3      : flt:   27.463   dbl:  19.1422  nsec/cyc  (0.70)
  trafomatrix::inverse     : flt:  70.7865   dbl:  64.1694  nsec/cyc  (0.91)
  trafomatrix*trafomatrix  : flt:  83.5633   dbl:   68.334  nsec/cyc  (0.82)
  matrix3*matrix3          : flt:  67.8196   dbl:  59.2467  nsec/cyc  (0.87)

--<AthlonXP 1.47GHz Linux-2.6.11 gcc-3.4.2 for Win32, NPTL, WINE EMULATION!>--
  (nothing)                : 0.356384 nsec/cyc
  vector3::length          : flt:  29.7528   dbl:  27.0064  nsec/cyc  (0.91)
  vector3*vector3          : flt:  15.3446   dbl:  7.66602  nsec/cyc  (0.50)
  vector3 x vector3        : flt:  21.3162   dbl:  13.4919  nsec/cyc  (0.63)
  matrix3*vector3          : flt:  26.0698   dbl:  26.6165  nsec/cyc  (1.02)
  trafomatrix*vector3      : flt:  26.3388   dbl:  25.8365  nsec/cyc  (0.98)
  trafomatrix::inverse     : flt:   126.99   dbl:  114.622  nsec/cyc  (0.90)
  trafomatrix*trafomatrix  : flt:  128.008   dbl:  134.121  nsec/cyc  (1.05)
  matrix3*matrix3          : flt:  102.492   dbl:  105.253  nsec/cyc  (1.03)