From: Wolfgang W. <ww...@gm...> - 2005-05-25 16:41:40
|
Hello Andrew! On Wednesday 25 May 2005 02:00, Andrew Clinton wrote: > The only way I think you'll be able to make sure this isn't > happening is to check the assembly code manually. > The easiest thing is simply to USE the output (e.g. by summing it all up) and slightly vary the input (e.g. by adding 0.1 each iteration). This is what is actually done (right?) Another possibility is to look at execution time scaling (which I did as well): - The basic operations are inline functions which are called 4 times in each loop. If I comment out 2 of them, the time halves. - There is one no-op measurement, i.e. a loop with 4 inlines which to nothing. This is the "(nothing)" line. Its execution time indicates required time if the stmts are optimized away and does not change when 2 of the calls are commented out. While only looking at and understanding the assembly gives a definite answer, the scaling already gives confidence and the using/varying the vars prevents the compiler from removing operations. I tried the explicit float casting (by adding a "f" to all FP constants) and it did not change the timings. > It looks like your code is subject to some of these problems. I've made > some attempts at fixing it but it is difficult to verify (looking through > lots of assembly). > Did you see any change in the timings? Feel free to commit a "fixed" version; I'd like to have a look at it. > Especially problematic might be the production of a > result value that is never used again, > Well, the results in the loop statements are usually summed up and hence "used". The calculated sum is in the end not used but that should not make any difference. The fact that produced results are not directly needed for the next calculation may enable the compiler/processor to make better use of pipelining. I expect this to be benifitial to both types (float, double). > Maybe a better test would be to do > something useful with each type of arithmetic (intersect a ray with a > sphere?). > Sure. Let me tell you something about the history why I did this: I just wanted to get a feeling for the time needed for typical floating point operations as compared to the time required by thread-safe locking and context switching. So, as I wrote in the email the day before, I'd be very happy if you could post some real ray-shape intersection timings for your raytracer and a "typical" scene (i.e. lots of basic CSG / a big triangle mesh / ..). Because the point is basically that I would like to compare the cost of a ray-shape intersection to the cost of a coroutine switch. This is required to decide if it makes sense to use ray-shape intersections as request type or if the introduced overhead is too high. Wolfgang |