Menu

#56 Possible sqrt() bug when built for m68k

v1.0_(example)
open
nobody
None
5
2015-01-19
2015-01-17
Chris Young
No

I'm having some odd maths errors when using clib2 under 68k, so I ran the Paranoia maths test program (attached) to try and track them down. It shows some "serious defects" in the sqrt() function when built for 68k-clib2. Building the same program for OS4 against clib2 shows no errors at all.

The program says that the problem is the sqrt of 0.0, -0.0 or 1.0 is calculated incorrectly. I've written a much smaller test program just to test this, and it works, so I'm not entirely sure exactly where these calculation errors are coming from.

I built clib2 for 68k using gcc3.4.6.

1 Attachments

Discussion

  • Chris Young

    Chris Young - 2015-01-17

    On further investigation, it appears that the failure is with the test:
    One == SQRT(One)

    So even though sqrt(1.0) prints as 1.0 it isn't precisely equal to 1.0.

    I doubt this is what is causing my problems, and the other defects/serious defects might be more important.

     
  • Olaf Barthel

    Olaf Barthel - 2015-01-19

    I'll have a look at the tests. From what I know, testing for exact equality when comparing results of floating point operations is rarely possible and most of the time not even effective, floating point calculatings yielding approximate values.

    Which 68k target did you use, specifically? The 68k build can either use inline FPU code (68881/68882, 68040, 68060), or it can use mathieeedoubbas.library/mathieeedoubtrans.library. In the case of sqrt() the mathieeedoubtrans.library function IEEEDPSqrt() is called and the result is returned without any other operations preceding or following that function call.

    If there is a fault with the accuracy of the floating point functions, it may actually be with the operating system's floating point libraries.

     
  • Olaf Barthel

    Olaf Barthel - 2015-01-19

    The tests look good to me, and the warnings given for the accuracy of the sqrt() results make sense to me, too. The specific tests gave warnings because the calculated error between the expected and the actual result values exceed a certain threshold. The error is rather small but it ought to be much smaller, if not zero. This is why the sqrt() tests complain with "failure" and "serious defect" messages.

    Looking at the respective test program binaries for 68k (which you provided), it appears that they were built to use the operating system floating point libraries in mathieeedoubbas.library and mathieeedoubtrans.library, respectively. I had a look at how the mathieeedoubtrans.library version of sqrt() is implemented, and at a cursory glance it appears that it was optimized for speed and not necessarily for accuracy.

    The atan() and sqrt() functions are the only ones in the entire library which were rewritten in assembly language, with the sqrt() function implementing a different algorithm than the original 'C' language version.

    Long story short: you may not want to depend upon mathieeedoubbas.library and mathieeedoubtrans.library for accuracy.

    Incidentally, the AmigaOS4 build of the test program does better than the 68k build because the clib2 version uses fdlibm 5.3 for its floating point operations. The same is true for the newlib version.

     
  • Chris Young

    Chris Young - 2015-01-19

    OK, that seems reasonable. The reason I was looking into this is because I built Curl using clib2, and it has an odd bug which I suspected would be a maths error. When the Curl executable is run it immediately exits with a "connection timed out after xxxx milliseconds" (I can't remember the number and am not in a position to check it right now, but I think it may have been in the order of a couple of million ms - I guess a small negative value converted to an unsigned value). I can workaround this by setting the connection timeout to >5000 seconds.

    I have another program which displays the execution time since it was started. This appears to start off correctly, and then starts showing large negative values for the seconds (a couple of thousand or so) after a couple of seconds.

    They both appear to work in the same way: gettimeofday() is used to collect the time, and then the timevals are subtracted from one another. You can see the code curl uses here: https://github.com/bagder/curl/blob/master/lib/timeval.c

    I'll see if I can knock together a test case.

     
  • Olaf Barthel

    Olaf Barthel - 2015-01-19

    I think the reason why the interval differences comes out incorrectly is that the curlx_tvdiff() function in libcurl assumes that timeval.tv_usec is signed, which is not the case on AmigaOS (because of overlap with <devices time.h="">, which defines it as unsigned).

    Thus, the difference comes out correctly only if old_timeval.tv_sec < new_timeval.tv_sec && old_timeval.tv_usec < new_timeval.tv_usec.

    This should be rarely the case. Because timeval.tv_usec is unsigned and if old_timeval.tv_usec > new_timeval.tv_usec, old_timeval.tv_usec-new_timeval.tv_usec will underflow and wrap.

     
  • Chris Young

    Chris Young - 2015-01-19

    Excellent, casting that calculation to a signed long has fixed it. Thanks!

     

Log in to post a comment.