Re: [Ray-devel] FP benchmarking, float versus double

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hello Andrew!

On Wednesday 25 May 2005 02:00, Andrew Clinton wrote:
> The only way I think you'll be able to make sure this isn't
> happening is to check the assembly code manually.
>
The easiest thing is simply to USE the output (e.g. by summing it all up) 
and slightly vary the input (e.g. by adding 0.1 each iteration). 
This is what is actually done (right?)

Another possibility is to look at execution time scaling (which I did as 
well): 
- The basic operations are inline functions which are called 4 times in 
  each loop. If I comment out 2 of them, the time halves. 
- There is one no-op measurement, i.e. a loop with 4 inlines which to 
  nothing. This is the "(nothing)" line. Its execution time indicates 
  required time if the stmts are optimized away and does not change when 
  2 of the calls are commented out. 

While only looking at and understanding the assembly gives a definite answer, 
the scaling already gives confidence and the using/varying the vars prevents 
the compiler from removing operations. 

I tried the explicit float casting (by adding a "f" to all FP constants) 
and it did not change the timings. 

> It looks like your code is subject to some of these problems.  I've made
> some attempts at fixing it but it is difficult to verify (looking through
> lots of assembly).  
>
Did you see any change in the timings?
Feel free to commit a "fixed" version; I'd like to have a look at it. 

> Especially problematic might be the production of a 
> result value that is never used again, 
>
Well, the results in the loop statements are usually summed up and hence 
"used". The calculated sum is in the end not used but that should not make 
any difference. 

The fact that produced results are not directly needed for the next 
calculation may enable the compiler/processor to make better use of 
pipelining. I expect this to be benifitial to both types (float, double). 

> Maybe a better test would be to do
> something useful with each type of arithmetic (intersect a ray with a
> sphere?).
>
Sure. 

Let me tell you something about the history why I did this: I just wanted 
to get a feeling for the time needed for typical floating point operations 
as compared to the time required by thread-safe locking and context 
switching. 

So, as I wrote in the email the day before, I'd be very happy if you 
could post some real ray-shape intersection timings for your raytracer and 
a "typical" scene (i.e. lots of basic CSG / a big triangle mesh / ..). 

Because the point is basically that I would like to compare the cost of a 
ray-shape intersection to the cost of a coroutine switch. 
This is required to decide if it makes sense to use ray-shape intersections 
as request type or if the introduced overhead is too high. 

Wolfgang