Menu

Operation with floats returning -0

GuiRitter
2010-05-04
2012-09-26
  • GuiRitter

    GuiRitter - 2010-05-04

    I made a very easy program to calculate the roots of a quadratic function:

    #include<math.h>
    #include<stdio.h>
    #include<stdlib.h>
    
    float a,b,c,d,r1,r2;
    
    main(){
       printf("Insira a,b e c de uma equacao de 2o. grau: \n");
       scanf("%f",&a);
       scanf("%f",&b);
       scanf("%f",&c);
       if(a != 0){
          d = pow(b,2) - (4 * a * c);       // 1
          if(d >= 0){                       // 2
             r1 = (-b + sqrt(d)) / (2 * a); // 3
             r2 = (-b - sqrt(d)) / (2 * a);
             if(r1 == r2){
                printf("A equacao possui uma raiz dupla, igual a %f\n",r1);
             }else{
                printf("A equacao possui raizes iguais a %f e %f\n",r1,r2);
             }
          }else{
             printf("A equacao possui raizes complexas\n");
          }
       }else{
          printf("A equacao nao e' de segundo grau\n");
       }
       system("pause");
    }
    

    If I input the values 1, -10 and 25, the expected roots are a double 5. But,
    instead, the variable d receives -0 as a result (comment 1) , thus failing the
    IF condition at comment 2. If that IF is ignored, and the roots get calculated
    (comment 3), the variables r1 and r2 receives -1.#IND00. I've heard that it's
    a commom problem in some compilers to correctly implement sqrt and pow using
    floats, as codepad.org have been able to successfully compile and run this
    code. I'd like to know if there's a way this can be fixed in my Dev 4.9.9.2,
    or if it have been fixed in the beta versions of Dev 5.

     
  • GuiRitter

    GuiRitter - 2010-05-04

    Damn the post can't be edited here. Just found out that Dev 5 is 4.9.9.2.
    That's confusing! Anyway, can it be fixed in a future release?

     
  • cpns

    cpns - 2010-05-04

    In no particular order:

    1) Occam's razor should suggest that the fault is with your code, not the
    compiler or library. Unexpected results from naive use of binary floating
    point are not uncommon. In this case if pow( 10.0f, 2 ) evaluates to say
    99.9999999999999, the result of an implicit cast to float may be 99.99999 or
    100.0; that is allowable behaviour in both cases and not an error.

    A possible fix is:

            d = pow(b,2) - (4 * a * c);                 // 1
    
            // Small FP error fix
            if( d < 0 && d > -FLT_EPSILON )
            {
                d = 0.0f ;
            }
    

    FLT_EPSILON being defined in <float.h>

    2) Dev-C++ is an IDE not a compiler. It uses GCC (MinGW). No ammount of
    Dev-C++ updates will fix this issue.

    3) Dev-C++ has not been updated for 5 years; don't hold your breath.

    4) The derivative wxDev-C++ is maintained.

    5) If you want a compiler update, go to www.mingw.org, and hope that the
    latest version still works with the ancient Dev-C++. Or use a different IDE or
    compiler. I'd recommend VC++ 2010 Express Edition (Free, and a far better
    debugger), your code works as is in VC++ 2008, but that does not make it safe.

    6) How did you determine the value was "-0"? It is likely that it was in fact
    something like -0.000001, and your method of interrogation rounded it. You
    should view the value in the debugger rather than a print statement.
    Unfortunately Dev-C++'s debugger sucks.

    7) The ISO C standard library defines <math.h> functions only for double
    precision. If you want versions defined for float, you need to compile as C++
    and include <cmath>. C does not support function overloading, so it is
    impossible to do in C without differently named functions.

     
  • GuiRitter

    GuiRitter - 2010-05-04

    1) I'd agree with you but as I said before, codepad.org have been able to
    compile and run it as expected, without errors (it's a website that compile
    and runs your codes; modifications are needed as it doesn't accept scanf and
    etc).
    Couldn't understand the fix though.

    3) Wow.

    5) I'll test VC++.

    6) Indeed it was printf'd, displayed -0.000000 . 100 minus 100 probably had a
    .000001 lost somewhere.

     
  • cpns

    cpns - 2010-05-04

    1) Working in one compiler and not another is not an arbiter of correctness,
    and I certainly would not count an on-line compiler as that arbiter!).
    Undefined behaviour is undefined behaviour anything can happen including what
    you expect, but also what you don't.

    The fix simply says that id d is negative but the smallest possible value of
    negative, coerce it to exactly zero.

    5) Do that, but that does not make the code correct, it is not a compiler bug,
    but rather undefined behaviour

    6) More likely that it was 99.99999 - 100.0. printf() as its name suggests
    formats text; for %f this includes rounding to a default number of decimal
    places. That is why you should rather watch the value in the debugger.
    Alternatively:

    printf( "%.30f", d ) ;
    

    may show the missing digits. For example consider the following:

        float val = 0.000000000001f ;
    
        printf( "%f\n", val ) ;
        printf( "%.15f\n", val ) ;
        printf( "%.30f\n", val ) ;
    

    In VC++ 2008, it outputs:

    0.000000
    0.000000
    0.000000000001000
    0.000000000000999999996004197200
    

    Which demonstrates two things:

    1) printf() shows the value matching the format specification not the value
    stored.
    2) Decimal values are stored in binary floating point as necessary
    approximations.

    Although the value 100.0 can be stored exactly in binary FP, this may not be
    true of all the intermediate calculations performed in the pow() function. If
    you get a pow() function that works as you expect; that is dumb luck, or it
    may be treating power-of-2 as a special case since it is so common, it may
    just as easily produce different results for different operands - equally
    wrong.

    In this case you would probably be far better off using:

    d = (b * b) - (4 * a * c);
    

    ... since b^2 is a special and simple case of exponentiation, whareas pow() is
    designed to cope with any value raised to any power (not just integers). Also
    multiplication of floats is entirely dependent upon the compiler
    implementation and not on the library implementation. Since float
    multiplication is performed directly by the FPU the results are likely to be
    identical for all compilers running on the same FPU implementation.

     
  • cpns

    cpns - 2010-05-04

    Oops, extra row of output there! Ignore the first 0.00000.

     
  • GuiRitter

    GuiRitter - 2010-05-05

    Hmm, starting to understand. Indeed, (b * b) fixed the whole thing and it's
    working as expected now.
    But one day I might need to use something like pow(x,y) and how can I be sure
    that i'll work? Will I have to make a function for that ? Which is actually
    easy but still...

    P.S.¹: Remembered that I forgot to translate the printfs, but that's no longer
    necessary. Sorry.
    P.S.²: Thought that the code tag would monospace the text inside; that's why
    the comments are out of alignement.

     
  • GuiRitter

    GuiRitter - 2010-05-05

    Heh, just found another problematic situation with these floats! I input 0.35
    and the variable becomes 0.34999...
    How to make (x < 0.35) false inputing 0.35?

     
  • cpns

    cpns - 2010-05-05

    But one day I might need to use something like pow(x,y) and how can I be
    sure that i'll work?

    It will work, but will always be an approximation. Mostly it does not
    matter, and where it does matter you have to design teh code to cope with it
    or minimise it.

    Think about it, a single precision floating point value uses 32 bits to
    represent values in the range +/- 3.402823466e+38F, that is just 2^4 (about 4
    million) combinations to represent an infinite range. Now the thing about
    floating point representations are that they have two components, the exponent
    and the mantissa. The mantissa provides the numeric value, and the exponent
    the scale. This allows both very large numbers and very small numbers to be
    represented, but the gap between adjacent representable values varies and is
    very large for very large magnitude values, and very small for values close to
    zero.

    The biggest errors occur when an expression mixed very large and very small
    numbers because teh scale factors have to be normalised, and when that occurs
    the small values can end up rounded to zero.

    The other issue with floating point is that for performance and implementation
    reasons computers generally use binary floating point, so that a decimal
    number such as 0.35 cannot be exactly represented. You may think this odd, but
    it is no different say from the fact that say pi or 1/3 cannot be expressed
    precisely as a decimal value; you do not normally worry about these
    imprecisions, and would regard say 0.3333333 as being 'close enough' to 1/3
    (close enough for Jazz and millatry use as they say!). However when the
    computer makes a binary approximation for exactly teh same reasons, suddenly
    it is not close enough.

    There is a partial solution; and that is to use decimal floating point.
    However because it is not the native representation of the machine it is very
    slow in comparison. Also in C/C++ decimal floating point is not a built in
    type, and has no math libray support (although it is in up-coming standards I
    believe). C# has a decimal type. Now I say that it is only a partial solution,
    because it suffers from exactly the same imprecision issues as BFP, but
    because it is decimal, the approximations are 'human', so the result of
    expression evaluation should be identical to those you'd get if you performed
    the calculations by hand with sufficient digits (though typically the computer
    will use more significant digits than a human, so produce more accurate
    results).

    Another solution is to use an arbitrary precision math library; because the
    hardware typically supports only single and double precision data of fixed
    width, arbitrary preciosn must be implemented in software, so like decimal
    floating point, it is much slower.

    Will I have to make a function for that ?

    If your exponent will be a small integer (square or cube), you will probably
    always get better results by multiplication. Arbitrary integer exponents will
    become linearly less efficient using iterative multiplication; the pow()
    function will no doubt use an algorithm and/or possibly hardware instructions
    to give less variable performance, possibly at the expense of some precision.

    How to make (x < 0.35) false inputing 0.35?

    In most real world applications it does not matter. For example if you were to
    make a measurement using some instrument or sensor for example and result were
    0.35, who is to say if the true value were 0.350000001 or 0.349999999 for
    example? What are the real chances of it being exactly 0.35 or indeed of it
    remaining so when say the temperature rises by 1 degree? Real data is always
    noisy and always an approximation within the limits of the precision of the
    instrument use to make the measurement. So for an input of 0.35 for most
    practical purposes it is in fact arbitrary whether that is considered less
    than, greater than or equal to 0.35. In cases where it matters (for example to
    stop a relay burning out switching a heater due to a hard threshold on a
    thermostat, it is usual it include some hysteresis; for example switch on at <
    0.35, and switch off at >0.36.

    Anyway my point is that you could try to solve your problem through ever more
    complex software, or you could change your perception of the problem and
    realise that in most cases it is not a problem, or that the problem can be
    solved by design rather than math. If you were served 0.99999 pints of beer,
    would you really complain to the barman!?

    Anyway, that's enough essay from me. You should probably read "What every
    computer scientist should know about floating-point
    arithmetic"

     
  • GuiRitter

    GuiRitter - 2010-05-06

    All right I got it all that you said and I'll read that text some other time.
    But concerning my last quote about the 0.35, it's a value to be inputed
    humanly, and not through a sensor, and I've worked with sensors before (in
    class) and I understand about hysteresis.

    But about the 0.35, as I said is a humanly inputed value that must be compared
    to a certain standard, and to clearly see that the computer made the "wrong"
    choice is just bad. But I'll leave it as it is and think more about it later,
    for it's a proposed exercise and nothing important.

    By the way, thanks for all the help and consideration!

     
  • cpns

    cpns - 2010-05-06

    it's a value to be inputed humanly

    Indeed; my point stands, what does the value represent? How was it measured or
    determined. Is the result really wrong? For example if you end up with a value
    0.3499999 or whatever, but use a sensible output formatting to a required
    precision, say two decimal places, ostream, or printf() will display 0.35 -
    the user need not be aware of the internal discrepancy.

    However there is perhaps a problem when rounding error accumulates to the
    point that the output is incorrect even within the desired output precision.
    Also as you have pointed out, there is a problem also when equality or
    inequality. The solution in these cases will vary according to the needs of
    the application.

    In your example, there may in fact be no problem; if 0.35 cannot be exactly
    represented in binary floating point, then neither can x be exactly equal to
    0.35 ! The compiler will convert the decimal literal constant to BFP, and x is
    already in BFP, The closest x can get to 0.35 will be exactly what the
    compiler converted 0.35 decimal to.

    If precision is critical, but range and flexibility are not not, then you
    could use decimal fixed point, so instead of say 0.35, you would store and
    integer 3500 (using a scale factor of 10000, thus giving four decimal places
    of precision). Addition and subtraction are simple, multiplication and
    division require a re-scaling operation; math functions you would have to
    implement yourself from these primitives. There are fixed point libraries, but
    for efficiency they are generally binary fixed point, so may suffer from the
    same issue you are concerned about here.

    To be honest, you are mostly worrying about things that do not in the end turn
    out to be a problem. Some understanding of floating point is necessary (hence
    the reading material), to avoid gotchas, but right now you may be inventing
    non-existent problems.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.