Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

Precision Oddity

2010-09-12
2013-04-22
  • I created an ars. I looked at it with the l(ist) command and I also looked at it with the get command.
    The values do not match! I set a value of "5.79781". The l(ist) command reports that same value, but
    the get command reports: "5.797810000000000130171429"

    This is not a decimal to binary and back, conversion error. That might give one LSB error like:
    5.797810000000000000000001 or
    5.797809999999999999999999
    It will NOT give 9 or ten random digits at the end!

    I do not know if this is actually causing any problems, but I would not be surprised!
    I realize these errors are below parts per trillion, won't likely be visible on image made from them.
    Please explain.

    Here are the three commands:
    mged> in fontTst-A1.Q.s ars 4 6 3.0 5.79781 0.0 3.0 6.37757 0.5 3.0 5.49726 0.2071 3.0 5.49726 -0.2071 3.0 6.37757 -0.5 4.29504 2.7071 0.5 3.8811 3.0 0.2071 3.8811 3.0 -0.2071 4.29504 2.7071 -0.5 1.70496 2.7071 0.5 2.1189 3.0 0.2071 2.1189 3.0 -0.2071 1.70496 2.7071 -0.5 3.0 6.37757 0.5 3.0 5.49726 0.2071 3.0 5.49726 -0.2071 3.0 6.37757 -0.5 4.02242 2.9 0.0

    mged> l fontTst-A1.Q.s
    fontTst-A1.Q.s:  arbitrary rectangular solid (ARS)
    6 curves, 4 points per curve
    V (3, 5.79781, 0)
    Curve 0:
    (3, 5.79781, 0)
    (3, 5.79781, 0)
    (3, 5.79781, 0)
    (3, 5.79781, 0)
    Curve 1:
    (3, 6.37757, 0.5)
    (3, 5.49726, 0.2071)
    (3, 5.49726, -0.2071)
    (3, 6.37757, -0.5)
    Curve 2:
    (4.29504, 2.7071, 0.5)
    (3.8811, 3, 0.2071)
    (3.8811, 3, -0.2071)
    (4.29504, 2.7071, -0.5)
    Curve 3:
    (1.70496, 2.7071, 0.5)
    (2.1189, 3, 0.2071)
    (2.1189, 3, -0.2071)
    (1.70496, 2.7071, -0.5)
    Curve 4:
    (3, 6.37757, 0.5)
    (3, 5.49726, 0.2071)
    (3, 5.49726, -0.2071)
    (3, 6.37757, -0.5)
    Curve 5:
    (4.02242, 2.9, 0)
    (4.02242, 2.9, 0)
    (4.02242, 2.9, 0)
    (4.02242, 2.9, 0)

    ars NC 6 PPC 4 C0 { { 3 5.797810000000000130171429 0 } { 3 5.797810000000000130171429 0 } { 3 5.797810000000000130171429 0 } { 3 5.797810000000000130171429 0 } } C1 { { 3 6.377570000000000405293576 0.5 } { 3 5.497259999999999813269369 0.2071000000000000063060668 } { 3 5.497259999999999813269369 -0.2071000000000000063060668 } { 3 6.377570000000000405293576 -0.5 } } C2 { { 4.295040000000000190993887 2.707100000000000061817218 0.5 } { 3.881099999999999994315658 3 0.2071000000000000063060668 } { 3.881099999999999994315658 3 -0.2071000000000000063060668 } { 4.295040000000000190993887 2.707100000000000061817218 -0.5 } } C3 { { 1.704960000000000031050718 2.707100000000000061817218 0.5 } { 2.118900000000000005684342 3 0.2071000000000000063060668 } { 2.118900000000000005684342 3 -0.2071000000000000063060668 } { 1.704960000000000031050718 2.707100000000000061817218 -0.5 } } C4 { { 3 6.377570000000000405293576 0.5 } { 3 5.497259999999999813269369 0.2071000000000000063060668 } { 3 5.497259999999999813269369 -0.2071000000000000063060668 } { 3 6.377570000000000405293576 -0.5 } } C5 { { 4.022420000000000328554961 2.899999999999999911182158 0 } { 4.022420000000000328554961 2.899999999999999911182158 0 } { 4.022420000000000328554961 2.899999999999999911182158 0 } { 4.022420000000000328554961 2.899999999999999911182158 0 } }

    Gilligan

     
  • This still fails in brlcad version 7.16.10. I found it if ver 7.12, and was hoping it was corrected.

    Gilligan

     
  • Sean Morrison
    Sean Morrison
    2010-10-14

    This is definitely not a bug.  This shouldn't even be unexpected.

    The precision of the get command is higher than the default computation tolerance ( 0.0005mm ) as well as the list command's printing precision.  Moreover, the differences you're seeing are 15 decimal places out, which is at the expression epsilon of an IEEE double precision floating point representation.  Simply storing a value into a register and reading it back can induce changes that far out.

    It's worth noting that the gets command reports what is saved to disk and is merely printing with excessive precision.  Plus, the differences you're noticing are on the fermi scale (1.0^-15).  The measurements values, if significant, differ by about the width of a single molecule.

    Most programs including other CAD software mask floating point representation details by simply not printing so many digits.

    It's not a problem. :)

     
  • Sean said:
    >>>    "Simply storing a value into a register and reading it back can induce changes that far out. "

    I'm sorry, but any computer that does that is broken! Any software that does that has a bug! I designed computer hardware for many years, and I have written a fair bit of software too. You seem to imply that some parts of your software write less precision than other parts read. Well, that is the bug, right there!

    As to weather it's worth fixing or not, that's your call. I agree that is is very unlikely to cause any real world problems, and you may well have many issues with higher priority. But it is still a bug!

    Gilligan

     
  • Sean Morrison
    Sean Morrison
    2010-10-14

    I made no such implication regarding how BRL-CAD reads/writes with respect to precision.  We have extensive controls in place throughout the code to reduce or eliminate/avoid the accumulation of significant error due to the floating point unit, but there is only so much that can be done.  Moreover, as a computation system, there is no guarantee offered below the default computation tolerance.  By definition, numeric changes are OK if they are within that tolerance.

    If you can isolate a problem with tolerance management in our source code where there is a change occurring below what the hardware will guarantee per IEEE 754, I'll be surprised and *delighted* to accommodate a fix.

    As a corollary, there are plenty of numbers that simply cannot be faithfully represented using floating point arithmetic (e.g., 0.1) regardless of precision or tolerance.  Doing computations with numbers that cannot be exactly represented can cascade a tolerance change as well.  That's just a couple of several possible sources of tolerance injection that can occur automatically and unavoidably.  These are completely outside our control and a characteristic of floating point representation.

    Like I said, we go to extensive lengths to manage this, but it's unavoidable (particularly for x86 hardware) without using an alternative (and absurdly slower) numeric representation like fixed point arithmetic.

    If you find that there's something else going on, I'd be delighted to apply a fix.. otherwise, there's no constructive point in arguing over what is considered a bug versus hardware limitations versus undesirable compilation behavior. 

    Cheers!
    Sean

     
  • I well understand that computer math is not perfect and has limitations. I can't say if I ever read the IEEE 754, it may not have even been written 20 years ago when I read some IEEE number specs. I will say I don't recall any statements like your assertion: "Simply storing a value into a register and reading it back can induce changes that far out." I would, in fact, be quite shocked if any computer spec would say such a thing. I'm not talking about doing any math, just writing a register and reading it back!

    There is NO MATH happening in my test case, either. I just set the data points in an ars and read them back.

    I still claim that if a program can't do that, there is a problem.

    Gilligan

     
  • The program tries to convert your input into a binary representation.  But it is not possible to convert every finite decimal representation into a finite binary representation.  E.g. the binary representation of 0.1 (= 1/2 * 1/5) would lead to an infinite series of 0s and 1s (which represents powers of 1/2).  Now, if you take the first 8 bytes of the binary representation and convert them back to a decimal number you may see a value very similar to your original input with some "random numbers" at the end.  That's because your number was converted to a final series of powers of 1/2.

    BTW, saying "series of powers of 2" would be true too.  On the right side of the point are the negative powers.  Very similar to the decimal numbers.

    About the register thing:
    I would not say "Simply storing a value into a register and reading it back can induce changes that far out."  However, doing the same operation twice and comparing the results may give you a "not equal".  The reason is the size of the register: 10 bytes.  The memory representation of a double is only 8 bytes.  E.g.
    - doing an operation
    - writing the result to the memory
    - doing the operation again
    - keeping this result in the register
    - reading back the first result into another register
    - comparing the two registers with the results
    may give you a "not equal".

    I.e. I would say: "Simply storing a value from a register into memory and reading it back can induce changes that far out."

    Daniel

     
  • As stated in my first post, this is not a decimal to binary and back, conversion problem. That will give at most ONE decimal digit of error, not the NILE error digits I see.

    Your register to memory problems make sense. It's reassuring to know that we all agree that computer memories are extremely reliable.I was beginning to wonder if I had stumbled into "The Twilight Zone!" Though, this is not my problem, either., But it seem to be the exact reverse of my problem.

    I set a value of "5.79781" with a "in" command. The l(ist) command reports that same value, but the get command reports: "5.797810000000000130171429"

    I wrote a simple script to test how tcl deals with number precision.
    ----- numTest.tcl --------------------------------
    #!/usr/brlcad/bin/tclsh

    proc numTest {num} {

        puts "      num ==$num=="
        set numx1 
        puts "  num * 1 ==$numx1=="
        set numd3 
        puts "  num / 3 ==$numd3=="
        set numd3x3 
        puts "(num/3)*3 ==$numd3x3=="
        set numd7 
        puts "  num / 7 ==$numd7==\n"
    }

    numTest 1.2
    numTest 5.797810000000000130171429


    Running in brlcad or a bash shell, gives the same result:

    mged> source numTst.tcl
          num ==1.2==
      num * 1 ==1.2==
      num / 3 ==0.39999999999999997==
    (num/3)*3 ==1.2==
      num / 7 ==0.17142857142857143==

          num ==5.797810000000000130171429==
      num * 1 ==5.79781==
      num / 3 ==1.9326033333333335==
    (num/3)*3 ==5.79781==
      num / 7 ==0.8282585714285714==


    I note that the default precision is 16 decimal digits. The tcl manual says it uses c double floating point for the math. Your get command is printing out an extra NINE digits, above this.

    I also note, that the line: "num ==5.797810000000000130171429==", and the one after it demonstrate that tcl does NOT do the string to binary conversion until some math is required. Tcl does manage to loose the extraneous digits.

    If, as you say, register to memory transfers LOOSE a couple of bytes, where does this extra data come from?

    Again, I point out, that an "in" command followed by a "get" command does NOT require any math (except, perhaps, decimal - binary - decimal conversion)

    Gilligan

     
  • What are you trying to prove?  That 3 * 0.39999999999999997 is 1.2?   It is evident in your example that there is a difference between the output of puts and the internal representation of your variables.  How much is this difference?  Find it out!

    E.g. you may type in a tclsh:
    set num 1.2
    puts $num    # as expected
    format "%.16f" $num    # very nice, I can see the smile in your face: :-)
    format "%.17f" $num    # what's this? surely only a small perturbation
    format "%.50f" $num    # and this is what we are talking about

    q.e.d.

     
  • What I am trying to prove, is that BRLCAD is saving garbage data at the end of the number.

    I tried the format "%.50f" $num.
    num ==1.39999999999999991118215802998747676610946655273438==
    Clearly, the data below the 16 th decimal position is just nonsense! I do find it VERY strange that the garbage data is anything other then "000…"! Just for fun, I ran the same program in perl. It gave the EXACT same garbage data down to the 50 th decimal position! I don't know why the conversion routine behaves this way, but this is not BRLCAD's problem.

    This does suggest what BRLCAD is doing wrong, though. It would seem, that they are writing data with a format similar to: "%.24g", while it is clearly, not justified to use any more then 16 digits below the decimal point.

    Gilligan

     
  • Sean Morrison
    Sean Morrison
    2010-10-20

    My first reply to your concern stated that the command "is merely printing with excessive precision".  Most programs mask floating point representation details by simply not printing so many digits.

    Regular user commands (such as the ist command) truncate their printing precision to a reasonable level to avoid having this extended discussion and explanation.  It's understandably surprising if you've not come across it before.  The  command, however, is really intended to be a developer command, so it does things differently and that's intentional.  It doesn't, however, make it wrong.

    I think this horse has been beaten pretty dead.  If you are interested in further reading on the subject, there's a great paper entitled "What Every Computer Scientist Should Know About Floating-Point Arithmetic"  that goes into depth about how the format is represented, stored, digits of precision, rounding errors, and much more in depth.  The common format in prevalent use is the IEEE 754  representation, so there's further useful reading there as well.

    http://www.validlab.com/goldberg/paper.ps
    http://en.wikipedia.org/wiki/IEEE_754-2008

    You've reported plenty of other valid issues and bugs that deserve attention.  I'd suggest the discussion be focused on those topics.