Thread: [GD-Windows] Strange float bug?
Brought to you by:
vexxed72
From: Brett B. <res...@ga...> - 2004-03-07 13:45:12
|
We have been tracking down the wierdest bug for several days. Our code = was crashing randomly and then we finally found a repeatable case where = an assert on an identity matrix when we check the orthonormalness of it = and we chanced upon a line of code that returned -nan that was simply x = =3D (1.0f - (a + b)). Anyway, I suddenly remembered a GDC talk about = Windows denormals, did a quick google and pasted a _controlfp( = _CW_DEFAULT, 0xfffff ) into our code and presto it now works. We make games for PS2 and GCN and are Windows amateurs, but I hate the = idea of not understanding what went wrong and how to properly handle = this. Has anybody ever encountered something like this before? Brett |
From: Simon O'C. <si...@sc...> - 2004-03-07 16:25:02
|
Hi Brett, Personally, almost every NaN I've ever encountered has been due to = something earlier on such as passing out of domain values to trig functions, = divisions by zero, square roots of negatives etc.=20 However, the fact that resetting the FPU control word to its default = fixes the problem hints that something else is changing it behind your back. = What value is the control word set to when the NaN occurs (_controlfp(0,0))? The "something" is likely to be a third party library... ...What immediately springs to mind is if you're using Direct3D and also using double precision floating point anywhere (probably unwittingly - = evil things as you'll know from your PS2 experience :o) IDirect3D*::CreateDevice() changes the control word so that the FPU uses 24bit mantissa precision (_PC_24) rather than the default 53bit = precision (_PC_53). It also (re)masks FP exceptions (which is the _controlfp() default, but it'll re-mask them if you'd unmasked them). It does this = for performance reasons. If you use double precision values in your code with _PC_24 you'll = likely get denormals, under/over flows and generally unexpected behaviour. If = you force the control word back to default, you might get unexpected = behaviour from D3D too. If you really need double precision or exceptions unmasked, you can pass = the D3DCREATE_FPU_PRESERVE flag to CreateDevice() - it will impact = performance slightly though since it makes D3D save, change and restore the control = word for every call that might change the control word. Unmasking FP exceptions can be handy during debugging for trapping = things like domain errors at source too IMO. Cheers, Simon O'Connor Programmer @ Acclaim & Microsoft DirectX MVP=20 > -----Original Message----- > From: gam...@li...=20 > [mailto:gam...@li...] On=20 > Behalf Of Brett Bibby > Sent: 07 March 2004 13:39 > To: Gam...@li... > Subject: [GD-Windows] Strange float bug? >=20 > We have been tracking down the wierdest bug for several days.=20 > Our code was crashing randomly and then we finally found a=20 > repeatable case where an assert on an identity matrix when we=20 > check the orthonormalness of it and we chanced upon a line of=20 > code that returned -nan that was simply x =3D (1.0f - (a + b)).=20 > Anyway, I suddenly remembered a GDC talk about Windows=20 > denormals, did a quick google and pasted a _controlfp(=20 > _CW_DEFAULT, 0xfffff ) into our code and presto it now works. >=20 > We make games for PS2 and GCN and are Windows amateurs, but I=20 > hate the idea of not understanding what went wrong and how to=20 > properly handle this. Has anybody ever encountered something=20 > like this before? >=20 > Brett >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials Free=20 > Linux tutorial presented by Daniel Robbins, President and CEO=20 > of GenToo technologies. Learn everything from fundamentals to=20 > system = administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=3Dick > _______________________________________________ > Gamedevlists-windows mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-windows > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_idU5 >=20 > --- > Incoming mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.596 / Virus Database: 379 - Release Date: 26/02/2004 > =20 >=20 --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.596 / Virus Database: 379 - Release Date: 26/02/2004 =20 |
From: Daniel V. <vo...@ep...> - 2004-03-07 19:42:58
|
> D3DCREATE_FPU_PRESERVE flag to CreateDevice() - it will impact performance > slightly though since it makes D3D save, change and restore the control word > for every call that might change the control word. Does anyone have any horror stories of this noticeably affecting performance? We've been using this flag for years and never noticed a measurable performance difference. > > We make games for PS2 and GCN and are Windows amateurs, but I > > hate the idea of not understanding what went wrong and how to > > properly handle this. Has anybody ever encountered something > > like this before? We actually just ran into something slightly related the other day. We had some code that did something like Clamp( 0.f / 0.f, 0.f, 1.f ) with the implementation of Clamp being template< class T > inline T Clamp( const T X, const T Min, const T Max ) { return X<Min ? Min : X<Max ? X : Max; } and it was returning 1.f (Max) if being compiled with VS.NET 2003 but 0.f/0.f (X) if you use gcc which was consistent across several versions used for Linux (2.95.3) and Mac (3.something). We ended up just fixing the code instead of looking into what causes the differences in handling QNaNs so I can't provide any further insight. -- Daniel, Epic Games Inc. |
From: Brett B. <res...@ga...> - 2004-03-08 00:10:44
|
Thanks everybody for the info so far. Here is what I found in the = control word conditions and value: with no error =3D 0x037f where 0x033f =3D (_PC_64 | _EM_INVALID | = _EM_ZERODIVIDE | _EM_OVERFLOW | _EM_UNDERFLOW | _EM_INEXACT | = _EM_DENORMAL) and I can't seem to figure out where the last bit 6 comes = from???? when I get an error =3D 0x007f which means I lost precision down to = _PC_24. That doesn't seem enough to cause the problem though does it? The error gets generated practically anyplace if we try to do math, for = example: out.right.x =3D 1.0f - (yy + zz); where out.right.x is a vector, yy =3D 0.0f and zz =3D 0.0f. thus I'm getting: -nan =3D 1.0f - (0.0f + 0.0f); Given the control word and the line above, is this the expected = behavior? Thanks, Brett ----- Original Message -----=20 From: "Simon O'Connor" <si...@sc...> To: <gam...@li...> Sent: Monday, March 08, 2004 12:17 AM Subject: RE: [GD-Windows] Strange float bug? >=20 > Hi Brett, >=20 >=20 > Personally, almost every NaN I've ever encountered has been due to = something > earlier on such as passing out of domain values to trig functions, = divisions > by zero, square roots of negatives etc.=20 >=20 > However, the fact that resetting the FPU control word to its default = fixes > the problem hints that something else is changing it behind your back. = What > value is the control word set to when the NaN occurs = (_controlfp(0,0))? >=20 >=20 > The "something" is likely to be a third party library... >=20 > ...What immediately springs to mind is if you're using Direct3D and = also > using double precision floating point anywhere (probably unwittingly - = evil > things as you'll know from your PS2 experience :o) >=20 > IDirect3D*::CreateDevice() changes the control word so that the FPU = uses > 24bit mantissa precision (_PC_24) rather than the default 53bit = precision > (_PC_53). It also (re)masks FP exceptions (which is the _controlfp() > default, but it'll re-mask them if you'd unmasked them). It does this = for > performance reasons. >=20 > If you use double precision values in your code with _PC_24 you'll = likely > get denormals, under/over flows and generally unexpected behaviour. If = you > force the control word back to default, you might get unexpected = behaviour > from D3D too. >=20 > If you really need double precision or exceptions unmasked, you can = pass the > D3DCREATE_FPU_PRESERVE flag to CreateDevice() - it will impact = performance > slightly though since it makes D3D save, change and restore the = control word > for every call that might change the control word. >=20 >=20 > Unmasking FP exceptions can be handy during debugging for trapping = things > like domain errors at source too IMO. >=20 >=20 > Cheers, >=20 > Simon O'Connor > Programmer @ Acclaim > & Microsoft DirectX MVP=20 >=20 >=20 > > -----Original Message----- > > From: gam...@li...=20 > > [mailto:gam...@li...] On=20 > > Behalf Of Brett Bibby > > Sent: 07 March 2004 13:39 > > To: Gam...@li... > > Subject: [GD-Windows] Strange float bug? > >=20 > > We have been tracking down the wierdest bug for several days.=20 > > Our code was crashing randomly and then we finally found a=20 > > repeatable case where an assert on an identity matrix when we=20 > > check the orthonormalness of it and we chanced upon a line of=20 > > code that returned -nan that was simply x =3D (1.0f - (a + b)).=20 > > Anyway, I suddenly remembered a GDC talk about Windows=20 > > denormals, did a quick google and pasted a _controlfp(=20 > > _CW_DEFAULT, 0xfffff ) into our code and presto it now works. > >=20 > > We make games for PS2 and GCN and are Windows amateurs, but I=20 > > hate the idea of not understanding what went wrong and how to=20 > > properly handle this. Has anybody ever encountered something=20 > > like this before? > >=20 > > Brett > >=20 > >=20 > >=20 > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials Free=20 > > Linux tutorial presented by Daniel Robbins, President and CEO=20 > > of GenToo technologies. Learn everything from fundamentals to=20 > > system = administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=3Dick > > _______________________________________________ > > Gamedevlists-windows mailing list > > Gam...@li... > > https://lists.sourceforge.net/lists/listinfo/gamedevlists-windows > > Archives: > > http://sourceforge.net/mailarchive/forum.php?forum_idU5 > >=20 > > --- > > Incoming mail is certified Virus Free. > > Checked by AVG anti-virus system (http://www.grisoft.com). > > Version: 6.0.596 / Virus Database: 379 - Release Date: 26/02/2004 > > =20 > >=20 >=20 > --- > Outgoing mail is certified Virus Free. > Checked by AVG anti-virus system (http://www.grisoft.com). > Version: 6.0.596 / Virus Database: 379 - Release Date: 26/02/2004 > =20 >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=CCk > _______________________________________________ > Gamedevlists-windows mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-windows > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_idU5 |
From: Donavon K. <kei...@ea...> - 2004-03-07 19:23:36
|
Sounds like a and/or b is set to an SNaN prior to this line, so you'll want to figure out how that's happening. I'd be willing to bet that you're dereferencing a bad float pointer or indexing outside of a float array, which would explain why the exception has been sporadic. (FPU ops never generate SNaNs.) Denormalized values are a different issue and shouldn't result in a NaN under add and subtract operations. Also note that _controlfp doesn't affect the denormal exception mask on x86. Donavon Keithley > -----Original Message----- > From: gam...@li... > [mailto:gam...@li...] On Behalf Of > Brett Bibby > Sent: Sunday, March 07, 2004 7:39 AM > To: Gam...@li... > Subject: [GD-Windows] Strange float bug? >=20 > We have been tracking down the wierdest bug for several days. Our code was > crashing randomly and then we finally found a repeatable case where an > assert on an identity matrix when we check the orthonormalness of it and > we chanced upon a line of code that returned -nan that was simply x = =3D > (1.0f - (a + b)). Anyway, I suddenly remembered a GDC talk about Windows > denormals, did a quick google and pasted a _controlfp( _CW_DEFAULT, > 0xfffff ) into our code and presto it now works. >=20 > We make games for PS2 and GCN and are Windows amateurs, but I hate the > idea of not understanding what went wrong and how to properly handle this. > Has anybody ever encountered something like this before? >=20 > Brett >=20 >=20 >=20 > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id638&op=3Dick > _______________________________________________ > Gamedevlists-windows mailing list > Gam...@li... > https://lists.sourceforge.net/lists/listinfo/gamedevlists-windows > Archives: > http://sourceforge.net/mailarchive/forum.php?forum_idU5 |