rounding errors in linear model from "stats"

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

#404 rounding errors in linear model from "stats"

Milestone: 5.0

Status: open

Owner: nobody

Labels: stats (1)

Priority: 5

Updated: 2014-08-18

Created: 2014-07-23

Creator: Karl Ratzsch

Private: No

This small, completely linear dataset gives NaN for the errors of slope and intercept in "stats" linear model:

stats "-"
1 2
2 3
4 5
e

I took a look at the code in "src/stats.c" (around line 320) and the one thing that sprang to mind was that line 318

double ss = (ssyy - res.slope * ssxy) / (n - 2);

could become negative by some rounding errors, so that the calculation of the errors (following lines) would include the root of a negative number.

Funnily, my self-compiled gnuplot5 (mingw/win7) gives a small (6e-10) positive error for the same dataset. Above is with Tatsuro´s gp5 binary from 2014-07-14. It´s the other way round with this one

stats "-"
1.1 2.2
2.2 3.3
4.4 5.5
e

Most other linear data triplets i tried just return errors zero.

I´m not sure how to tackle this, however, or if it should be fixed at all. In most cases, i guess, people know if their data is perfectly linear. ymmv

(errors in the range of 1e-7 can already trigger the described problem, e.g

stats "-"
1 2
2 3
4 5.0000001

. Who does measurements that are so good? ;-) )

Discussion

Ethan Merritt - 2014-07-23

What exactly is the "bug" here?

Are you saying that on some platform the rounding error can cause the program to fail due to floating underflow or sqrt of a negative number?

Have you tried adding a test something like the one below?

if (fabs(ss) < DBL_EPSILON)
ss = 0.0;

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Ratzsch - 2014-07-24

Doesn´t make much difference, because ss contains ssyy - res.slope * ssxy, and ssyy and ssxy already are defined as such error-prone differences. So the uncertainity is somewhat bigger.

One could do

if (ss < 0) ss = 0.0; /* negative values can occur due to fp errors */

but that seems a bit half-hearted to me.

I think it´d be best to just throw a warning "Variance in linear model is below FP precision, errors might be invalid." if abs(ss) is below a certain value. But for this someone (with better knowledge of fp math than me) would have to derive that value.

Don´t know if that´d be worthwhile. ;-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Hans-Bernhard Broeker - 2014-07-24
  
  if abs(ss) is below a certain value.
  
  That criterion would be worse than useless. The result of abs(ss) changes in proportion if you multiply all input y values by a common factor. No fixed "certain value" could possibly be correct.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hans-Bernhard Broeker - 2014-07-24

This small, completely linear dataset gives NaN for the errors of slope and intercept in "stats" linear model:

And IMHO, that's a perfectly sensible outcome. Displaying anything else instead of the actual NaN that resulted would, to no small extent, be doing a disservice to the user.

Last edit: Hans-Bernhard Broeker 2014-07-24

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Ratzsch - 2014-07-24

Acknowledged, but with other similar data you get error zero or a small positive number. And if the data is only very nearly linear (last example above), the reported error can be grossly wrong. Some measurements are precise to the tenth digit, it could happen with "real" data, i say.

It´d be good to warn the user if the reported error is unreliable. One only needs to come up with a safe criterion for detection. I´ll try to work this out sometime.

Can we move this to "Feature Requests" in the meantime? I agree it´s by no means a real "bug".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ethan Merritt - 2014-08-18

Ticket moved from /p/gnuplot/bugs/1447/

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.