Re: [Ray-devel] FP benchmarking, float versus double

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I'm almost certain that there is some problem in your testing procedure.  I've
run a similar test (vector normalization) for floats and doubles, and will
always get about a 20% improvement with floats.  If you look at the manuals for
the processors, you'll see longer latency for the double operations.

I'd wager one of the following things might be messing up the tests...
- unecessary conversions: when you specify a float literal, always add 'f' (eg.
1.0f instead of 1.0).  Otherwise the compiler will synthesize conversions. 
Conversions can consume a lot of clock cycles.
- missing dependencies: compiler might be optimizing some code or taking it out
of the loop.  If you compute a result then never use it, the code might
disappear.  If you compute the same result many times, it might only execute
once.  The only way I think you'll be able to make sure this isn't happening is
to check the assembly code manually.

It looks like your code is subject to some of these problems.  I've made some
attempts at fixing it but it is difficult to verify (looking through lots of
assembly).  Especially problematic might be the production of a result value
that is never used again, so you aren't really timing the latency of the
operation at all.  Maybe a better test would be to do something useful with each
type of arithmetic (intersect a ray with a sphere?).

Andrew

Quoting Wolfgang Wieser <ww...@gm...>:

> Hello everybody,
> 
> In order to get an estimate how long basic numerical caculations take 
> on current hardware, I ran the speed test
> (src/lib/numerics/3d/test-speed.cc)
> on different platforms. 
> 
> The results are below. This is just plain C++ code compiled; there are 
> no hand-coded SSE vectorisations or similar. 
> 
> The most remarkable fact is that using floats instead of doubles may have 
> the opposite effect than expected... Especially note the value in brackets 
> at the end of the lines: It is the ratio of double and float calc times, 
> i.e. values larger than 1 mean "double takes longer" while values smaller 
> than 1 mean "double is faster than float". 
> 
> Wolfgang
> 
> BTW, in the last email I forgot to mention that longjmp'ing between thread 
> contexts is undefined in POSIX. So, we need to be prepared for the case that
> 
> it cannot be done. 
> 
> --<AthlonXP 1.47GHz Linux-2.6.11 gcc-4.1.0,
> NPTL>-----------------------------
>   (nothing)                : 0.533687 nsec/cyc
>   vector3::length          : flt:  28.0004   dbl:   26.718  nsec/cyc  (0.95)
>   vector3*vector3          : flt:  10.2434   dbl:   22.804  nsec/cyc  (2.23)
>   vector3 x vector3        : flt:  15.1952   dbl:  34.8585  nsec/cyc  (2.29)
>   matrix3*vector3          : flt:  26.2452   dbl:  40.8993  nsec/cyc  (1.56)
>   trafomatrix*vector3      : flt:  25.6166   dbl:  25.7406  nsec/cyc  (1.00)
>   trafomatrix::inverse     : flt:  126.307   dbl:  96.1046  nsec/cyc  (0.76)
>   trafomatrix*trafomatrix  : flt:  99.8809   dbl:  113.816  nsec/cyc  (1.14)
>   matrix3*matrix3          : flt:  75.9704   dbl:  131.277  nsec/cyc  (1.73)
> 
> --<P4 2.80 GHz Linux-2.6.11 gcc-3.4.4,
> LinuxThreads>--------------------------
>   (nothing)                : 0.118761 nsec/cyc
>   vector3::length          : flt:    15.69   dbl:  15.5786  nsec/cyc  (0.99)
>   vector3*vector3          : flt:  17.0779   dbl:  5.04801  nsec/cyc  (0.30)
>   vector3 x vector3        : flt:  16.3483   dbl:  11.8949  nsec/cyc  (0.73)
>   matrix3*vector3          : flt:  18.4656   dbl:  17.3939  nsec/cyc  (0.94)
>   trafomatrix*vector3      : flt:  18.4069   dbl:   17.374  nsec/cyc  (0.94)
>   trafomatrix::inverse     : flt:  120.072   dbl:  84.1365  nsec/cyc  (0.70)
>   trafomatrix*trafomatrix  : flt:  79.1698   dbl:  86.2536  nsec/cyc  (1.09)
>   matrix3*matrix3          : flt:   61.501   dbl:  68.3568  nsec/cyc  (1.11)
>  
> --<AMD64 1.8GHz FreeBSD-5.4
> gcc-3.4.2>----------------------------------------
>   (nothing)                : 0.287412 nsec/cyc
>   vector3::length          : flt:  16.2991   dbl:  16.4841  nsec/cyc  (1.01)
>   vector3*vector3          : flt:  10.1551   dbl:  5.49865  nsec/cyc  (0.54)
>   vector3 x vector3        : flt:  16.7179   dbl:  9.67714  nsec/cyc  (0.58)
>   matrix3*vector3          : flt:  27.3062   dbl:  19.6488  nsec/cyc  (0.72)
>   trafomatrix*vector3      : flt:   27.463   dbl:  19.1422  nsec/cyc  (0.70)
>   trafomatrix::inverse     : flt:  70.7865   dbl:  64.1694  nsec/cyc  (0.91)
>   trafomatrix*trafomatrix  : flt:  83.5633   dbl:   68.334  nsec/cyc  (0.82)
>   matrix3*matrix3          : flt:  67.8196   dbl:  59.2467  nsec/cyc  (0.87)
> 
> --<AthlonXP 1.47GHz Linux-2.6.11 gcc-3.4.2 for Win32, NPTL, WINE
> EMULATION!>--
>   (nothing)                : 0.356384 nsec/cyc
>   vector3::length          : flt:  29.7528   dbl:  27.0064  nsec/cyc  (0.91)
>   vector3*vector3          : flt:  15.3446   dbl:  7.66602  nsec/cyc  (0.50)
>   vector3 x vector3        : flt:  21.3162   dbl:  13.4919  nsec/cyc  (0.63)
>   matrix3*vector3          : flt:  26.0698   dbl:  26.6165  nsec/cyc  (1.02)
>   trafomatrix*vector3      : flt:  26.3388   dbl:  25.8365  nsec/cyc  (0.98)
>   trafomatrix::inverse     : flt:   126.99   dbl:  114.622  nsec/cyc  (0.90)
>   trafomatrix*trafomatrix  : flt:  128.008   dbl:  134.121  nsec/cyc  (1.05)
>   matrix3*matrix3          : flt:  102.492   dbl:  105.253  nsec/cyc  (1.03)
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by Oracle Space Sweepstakes
> Want to be the first software developer in space?
> Enter now for the Oracle Space Sweepstakes!
> http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
> _______________________________________________
> Ray-devel mailing list
> Ray...@li...
> https://lists.sourceforge.net/lists/listinfo/ray-devel
> 

----------------------------------------
This mail sent through www.mywaterloo.ca