[Algorithms] FIST optimization
Brought to you by:
vexxed72
From: John R. <jra...@ve...> - 2000-09-09 17:23:32
|
As many of you already know, the call to 'ftol' can be one of the biggest hotspot spikes in a 3d application. Every time a floating point number is converted into an integer, rather than simply producing a 'fist' instruction, MSVC will instead invoke a very slow procedure call to 'ftol'. I have heard this is to ensure that the rounding behaviour of the operation conforms to ANSI standard or something like that. Back when I was an assembly programmer I just used the fist instruction wherever needed. With MSVC there is a compiler optimization switch called /Qifist, which allows MSVC to use the 'fist' instruction instead of the call to 'ftol'. When I enable this optimization my performance profile is greatly improved. At the heart of my collision detection routines is a hash, where I take the floating point x,y co-ordinate in world space and remap it into an index to this precomputed collision detection table. This is a critical operation that requires the conversion from floating point to int, very efficiently. However, the behavior I am experiencing is as if the rounding mode were occasionally set to something other than truncate behavior. Meaning, where I expect a conversion from 2.99999 to int to produce a 2, not a 3. The behaviour is spurious and unpredictable. When I worked in assembly code I never had this problem. My main question is, how can the rounding mode ever get changed? Or am I missunderstanding the behavior possibly? The symptom is as follows. Without the /Qifist option enabled my collision detection works flawlessly. With the /Qifist enabled it behaves spuriously, exactly as if sometimes it rounds the float to hash index incorrectly, thus accessing the wrong pre-computed collision detection polygon set. Why can't you guarentee that the rounding mode is set to truncate throughout the application? I could switch to inline assembly I suppose, but I prefer to have the compiler generate the code naturally with all of it's normal optimizations enabled. Does anyone know the preferred method to guarentee the processor is in a rounding mode which causes truncation to always occur? Or am I missinterpreting the problem entirely? Thanks, John |