Confusion between fit errors shorthands, possible bug also with range limiting

A portable, multi-platform, command-line driven graphing utility

Brought to you by: broeker, cgaylord, lhecking, sfeam

#2192 Confusion between fit errors shorthands, possible bug also with range limiting

Status: open

Owner: nobody

Labels: None

Priority:

Updated: 2019-08-20

Created: 2019-08-20

Creator: Peter Salavec

Private: No

Issue

Gnuplot 5.2 documentation on site 75 states the following about set fit errors:

A few shorthands for the errors qualifier are available: yerrors (for fits with 1 column of independent variable), and zerrors (for the general case) are all equivalent to errors z, indicating that there is a single extra column with errors of the dependent variable.

However, the behavior is not the same for the different ones, and it seems to affect also range limitings (or the latter might be an independent thing, even not a bug).

Technical parameters

gnuplot 5.2.5
ubuntu 16.04 LTS
wxt terminal, interactive mode

Reproducible

As per my tests, always.
However, there is a chance that the actual data (and esp. the error estimates/weight calculation or the weighted fitting itself on the data) can return with unexpectedly bad fitting, while never without errors. (Even on the same data from the same initial values.) Mostly I would expect larger differences only in the asymptotic standard errors and at only a few cases in the parameters itself.

Details

My actual data

(This might be out of interest)
A data file contatining 64751 rows (important later) and 3 columns (x,z,delta_z) theyr x range is [0:1] and always has a subrange [xmin:xmax] for which z is in [0:1], and x and z are linearly related. delta_z is mostly in order of 0.1. The actual task is to fit a sigmoidal contrast function (see e.g. here) to those points for which the z data (i.e. $2 in the file) is in the range [0:1]. Filtering this way seems impossible (because the function to be fit is always within [0:1] by definition) unless I know the [xmin:xmax] subrange.

Behavior

After declaring the functio s(x) with parameters a and b, fitting is done against the data in the [xmin:xmax] subrange, which is set with set xrange before. As there are three columns in the file, using might be omitted (however, according to the manual, it is possibly a bad idea).

xrange ignorance

Also tested xrange handling by set xrange [xmin:xmax] ; fit s(x) ... and set xrange [0:1] ; fit [xmin:xmax] s(x) ... in all the following cases, and seems both ignore xrange always: the maximal slope (related to parameter b) is severely overestimated, the sigmoid's slope is much greater than the linear slope; when datafile contains only the points with x in [xmin:xmax], the slopes are nearly equal. The former is caused by that the points with y outside [0:1] shift the lower and upper tail toward 0 and 1 respectively.

When I left out errors and using 1:2 only, this ignorance disappear, but something is still strange: FIT_NDF is the same in both case (with and without errors), and after set autoscale xy ; replot (making yrange [-0.2:2.6] in my case), fit seems to reset yrange [0:1] which is actually the range of the function values. Thus, it seems to be unpredictable if data is filtered against this yrange or not: the FIT_NDF suggest correct filtering but the results differs so much that is impossible to believe is cause by just the errors.

`errors z`

fit s(x) 'file' using 1:2:3 errors z via a,b
This form forks only with using specified. Without using, it stops returning the error message Out of memory in fit: too many datapoints (4096)?.

`zerrors` and `yerrors`

fit s(x) 'file' using 1:2:3 zerrors via a,b
This form works with and without using, and the results are the same (and the same as the previous wiht using). The yerrors and zerrors produce the same, so they seem to be really equivalent.

Some notes on results

Fitting without errors on the whole dataset gives

a=0.302249+-0.0001197(0.03962%)
b=8.81983+-0.009261(0.105%)
FIT_NDF=33247 !!!
FIT_STDFIT=0.026446

so it seems that filtering happens even with unspecified ranges, or after set au xy ; rep. FIT_NDF begins to decrease only if I specify an xrange smaller than [xmin:xmax].
Fitting with errors gives

a=0.256654+-0.0003717(0.1448%)
b=21.6521+-0.06377(0.2945%) !!!
FIT_NDF=33247
FIT_STDFIT=468.141

thus, seemingly a much worse fitting, as can be seen on the attachments (file names tell you the case). The initial valuse were the same: a=0.32 and b=6, but testing with largely varying initial values (a from 0.2 to 0.4, b from 3 to 20) returns these same results.

2 Attachments

gpbugNoError.png

gpbugWithError.png

Confusion between fit errors shorthands, possible bug also with range limiting

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2192 Confusion between fit errors shorthands, possible bug also with range limiting

Issue

Technical parameters

Reproducible

Details

My actual data

Behavior

xrange ignorance

`errors z`

`zerrors` and `yerrors`

Some notes on results

Discussion

Confusion between fit errors shorthands, possible bug also with range limiting

A portable, multi-platform, command-line driven graphing utility

Priority

Searches

Help

#2192 Confusion between fit errors shorthands, possible bug also with range limiting

Issue

Technical parameters

Reproducible

Details

My actual data

Behavior

xrange ignorance

errors z

zerrors and yerrors

Some notes on results

Discussion

`errors z`

`zerrors` and `yerrors`