Dev-C++ / Discussion / Bloodshed Software Forum: Operation with floats returning -0

I made a very easy program to calculate the roots of a quadratic function:

#include<math.h>
#include<stdio.h>
#include<stdlib.h>

float a,b,c,d,r1,r2;

main(){
   printf("Insira a,b e c de uma equacao de 2o. grau: \n");
   scanf("%f",&a);
   scanf("%f",&b);
   scanf("%f",&c);
   if(a != 0){
      d = pow(b,2) - (4 * a * c);       // 1
      if(d >= 0){                       // 2
         r1 = (-b + sqrt(d)) / (2 * a); // 3
         r2 = (-b - sqrt(d)) / (2 * a);
         if(r1 == r2){
            printf("A equacao possui uma raiz dupla, igual a %f\n",r1);
         }else{
            printf("A equacao possui raizes iguais a %f e %f\n",r1,r2);
         }
      }else{
         printf("A equacao possui raizes complexas\n");
      }
   }else{
      printf("A equacao nao e' de segundo grau\n");
   }
   system("pause");
}

If I input the values 1, -10 and 25, the expected roots are a double 5. But,
instead, the variable d receives -0 as a result (comment 1) , thus failing the
IF condition at comment 2. If that IF is ignored, and the roots get calculated
(comment 3), the variables r1 and r2 receives -1.#IND00. I've heard that it's
a commom problem in some compilers to correctly implement sqrt and pow using
floats, as codepad.org have been able to successfully compile and run this
code. I'd like to know if there's a way this can be fixed in my Dev 4.9.9.2,
or if it have been fixed in the beta versions of Dev 5.

GuiRitter - 2010-05-04

Damn the post can't be edited here. Just found out that Dev 5 is 4.9.9.2.
That's confusing! Anyway, can it be fixed in a future release?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cpns - 2010-05-04

In no particular order:

1) Occam's razor should suggest that the fault is with your code, not the
compiler or library. Unexpected results from naive use of binary floating
point are not uncommon. In this case if pow( 10.0f, 2 ) evaluates to say
99.9999999999999, the result of an implicit cast to float may be 99.99999 or
100.0; that is allowable behaviour in both cases and not an error.

A possible fix is:

d = pow(b,2) - (4 * a * c); // 1 // Small FP error fix if( d < 0 && d > -FLT_EPSILON ) { d = 0.0f ; }

FLT_EPSILON being defined in <float.h> </float.h>

2) Dev-C++ is an IDE not a compiler. It uses GCC (MinGW). No ammount of
Dev-C++ updates will fix this issue.

3) Dev-C++ has not been updated for 5 years; don't hold your breath.

4) The derivative wxDev-C++ is maintained.

5) If you want a compiler update, go to www.mingw.org, and hope that the
latest version still works with the ancient Dev-C++. Or use a different IDE or
compiler. I'd recommend VC++ 2010 Express Edition (Free, and a far better
debugger), your code works as is in VC++ 2008, but that does not make it safe.

6) How did you determine the value was "-0"? It is likely that it was in fact
something like -0.000001, and your method of interrogation rounded it. You
should view the value in the debugger rather than a print statement.
Unfortunately Dev-C++'s debugger sucks.

7) The ISO C standard library defines <math.h> functions only for double
precision. If you want versions defined for float, you need to compile as C++
and include <cmath>. C does not support function overloading, so it is
impossible to do in C without differently named functions.</cmath></math.h>
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GuiRitter - 2010-05-04

1) I'd agree with you but as I said before, codepad.org have been able to
compile and run it as expected, without errors (it's a website that compile
and runs your codes; modifications are needed as it doesn't accept scanf and
etc).
Couldn't understand the fix though.

3) Wow.

5) I'll test VC++.

6) Indeed it was printf'd, displayed -0.000000 . 100 minus 100 probably had a
.000001 lost somewhere.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cpns - 2010-05-04

1) Working in one compiler and not another is not an arbiter of correctness,
and I certainly would not count an on-line compiler as that arbiter!).
Undefined behaviour is undefined behaviour anything can happen including what
you expect, but also what you don't.

The fix simply says that id d is negative but the smallest possible value of
negative, coerce it to exactly zero.

5) Do that, but that does not make the code correct, it is not a compiler bug,
but rather undefined behaviour

6) More likely that it was 99.99999 - 100.0. printf() as its name suggests
formats text; for %f this includes rounding to a default number of decimal
places. That is why you should rather watch the value in the debugger.
Alternatively:

printf( "%.30f", d ) ;

may show the missing digits. For example consider the following:

float val = 0.000000000001f ; printf( "%f\n", val ) ; printf( "%.15f\n", val ) ; printf( "%.30f\n", val ) ;

In VC++ 2008, it outputs:

0.000000 0.000000 0.000000000001000 0.000000000000999999996004197200

Which demonstrates two things:

1) printf() shows the value matching the format specification not the value
stored.
2) Decimal values are stored in binary floating point as necessary
approximations.

Although the value 100.0 can be stored exactly in binary FP, this may not be
true of all the intermediate calculations performed in the pow() function. If
you get a pow() function that works as you expect; that is dumb luck, or it
may be treating power-of-2 as a special case since it is so common, it may
just as easily produce different results for different operands - equally
wrong.

In this case you would probably be far better off using:

d = (b * b) - (4 * a * c);

... since b^2 is a special and simple case of exponentiation, whareas pow() is
designed to cope with any value raised to any power (not just integers). Also
multiplication of floats is entirely dependent upon the compiler
implementation and not on the library implementation. Since float
multiplication is performed directly by the FPU the results are likely to be
identical for all compilers running on the same FPU implementation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cpns - 2010-05-04

Oops, extra row of output there! Ignore the first 0.00000.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GuiRitter - 2010-05-05

Hmm, starting to understand. Indeed, (b * b) fixed the whole thing and it's
working as expected now.
But one day I might need to use something like pow(x,y) and how can I be sure
that i'll work? Will I have to make a function for that ? Which is actually
easy but still...

P.S.¹: Remembered that I forgot to translate the printfs, but that's no longer
necessary. Sorry.
P.S.²: Thought that the code tag would monospace the text inside; that's why
the comments are out of alignement.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GuiRitter - 2010-05-05

Heh, just found another problematic situation with these floats! I input 0.35
and the variable becomes 0.34999...
How to make (x < 0.35) false inputing 0.35?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cpns - 2010-05-05

But one day I might need to use something like pow(x,y) and how can I be
sure that i'll work?

It will work, but will always be an approximation. Mostly it does not
matter, and where it does matter you have to design teh code to cope with it
or minimise it.

Think about it, a single precision floating point value uses 32 bits to
represent values in the range +/- 3.402823466e+38F, that is just 2^4 (about 4
million) combinations to represent an infinite range. Now the thing about
floating point representations are that they have two components, the exponent
and the mantissa. The mantissa provides the numeric value, and the exponent
the scale. This allows both very large numbers and very small numbers to be
represented, but the gap between adjacent representable values varies and is
very large for very large magnitude values, and very small for values close to
zero.

The biggest errors occur when an expression mixed very large and very small
numbers because teh scale factors have to be normalised, and when that occurs
the small values can end up rounded to zero.

The other issue with floating point is that for performance and implementation
reasons computers generally use binary floating point, so that a decimal
number such as 0.35 cannot be exactly represented. You may think this odd, but
it is no different say from the fact that say pi or 1/3 cannot be expressed
precisely as a decimal value; you do not normally worry about these
imprecisions, and would regard say 0.3333333 as being 'close enough' to 1/3
(close enough for Jazz and millatry use as they say!). However when the
computer makes a binary approximation for exactly teh same reasons, suddenly
it is not close enough.

There is a partial solution; and that is to use decimal floating point.
However because it is not the native representation of the machine it is very
slow in comparison. Also in C/C++ decimal floating point is not a built in
type, and has no math libray support (although it is in up-coming standards I
believe). C# has a decimal type. Now I say that it is only a partial solution,
because it suffers from exactly the same imprecision issues as BFP, but
because it is decimal, the approximations are 'human', so the result of
expression evaluation should be identical to those you'd get if you performed
the calculations by hand with sufficient digits (though typically the computer
will use more significant digits than a human, so produce more accurate
results).

Another solution is to use an arbitrary precision math library; because the
hardware typically supports only single and double precision data of fixed
width, arbitrary preciosn must be implemented in software, so like decimal
floating point, it is much slower.

Will I have to make a function for that ?

If your exponent will be a small integer (square or cube), you will probably
always get better results by multiplication. Arbitrary integer exponents will
become linearly less efficient using iterative multiplication; the pow()
function will no doubt use an algorithm and/or possibly hardware instructions
to give less variable performance, possibly at the expense of some precision.

How to make (x < 0.35) false inputing 0.35?

In most real world applications it does not matter. For example if you were to
make a measurement using some instrument or sensor for example and result were
0.35, who is to say if the true value were 0.350000001 or 0.349999999 for
example? What are the real chances of it being exactly 0.35 or indeed of it
remaining so when say the temperature rises by 1 degree? Real data is always
noisy and always an approximation within the limits of the precision of the
instrument use to make the measurement. So for an input of 0.35 for most
practical purposes it is in fact arbitrary whether that is considered less
than, greater than or equal to 0.35. In cases where it matters (for example to
stop a relay burning out switching a heater due to a hard threshold on a
thermostat, it is usual it include some hysteresis; for example switch on at <
0.35, and switch off at >0.36.

Anyway my point is that you could try to solve your problem through ever more
complex software, or you could change your perception of the problem and
realise that in most cases it is not a problem, or that the problem can be
solved by design rather than math. If you were served 0.99999 pints of beer,
would you really complain to the barman!?

Anyway, that's enough essay from me. You should probably read "What every
computer scientist should know about floating-point
arithmetic"

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GuiRitter - 2010-05-06

All right I got it all that you said and I'll read that text some other time.
But concerning my last quote about the 0.35, it's a value to be inputed
humanly, and not through a sensor, and I've worked with sensors before (in
class) and I understand about hysteresis.

But about the 0.35, as I said is a humanly inputed value that must be compared
to a certain standard, and to clearly see that the computer made the "wrong"
choice is just bad. But I'll leave it as it is and think more about it later,
for it's a proposed exercise and nothing important.

By the way, thanks for all the help and consideration!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

cpns - 2010-05-06

it's a value to be inputed humanly

Indeed; my point stands, what does the value represent? How was it measured or
determined. Is the result really wrong? For example if you end up with a value
0.3499999 or whatever, but use a sensible output formatting to a required
precision, say two decimal places, ostream, or printf() will display 0.35 -
the user need not be aware of the internal discrepancy.

However there is perhaps a problem when rounding error accumulates to the
point that the output is incorrect even within the desired output precision.
Also as you have pointed out, there is a problem also when equality or
inequality. The solution in these cases will vary according to the needs of
the application.

In your example, there may in fact be no problem; if 0.35 cannot be exactly
represented in binary floating point, then neither can x be exactly equal to
0.35 ! The compiler will convert the decimal literal constant to BFP, and x is
already in BFP, The closest x can get to 0.35 will be exactly what the
compiler converted 0.35 decimal to.

If precision is critical, but range and flexibility are not not, then you
could use decimal fixed point, so instead of say 0.35, you would store and
integer 3500 (using a scale factor of 10000, thus giving four decimal places
of precision). Addition and subtraction are simple, multiplication and
division require a re-scaling operation; math functions you would have to
implement yourself from these primitives. There are fixed point libraries, but
for efficiency they are generally binary fixed point, so may suffer from the
same issue you are concerned about here.

To be honest, you are mostly worrying about things that do not in the end turn
out to be a problem. Some understanding of floating point is necessary (hence
the reading material), to avoid gotchas, but right now you may be inventing
non-existent problems.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Operation with floats returning -0

Open Source C & C++ IDE for Windows

Forums

Help

Operation with floats returning -0

Operation with floats returning -0

Open Source C & C++ IDE for Windows

Forums

Help

Operation with floats returning -0 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Operation with floats returning -0