Menu

#2192 Confusion between fit errors shorthands, possible bug also with range limiting

open
nobody
None
2019-08-20
2019-08-20
No

Issue

Gnuplot 5.2 documentation on site 75 states the following about set fit errors:

A few shorthands for the errors qualifier are available: yerrors (for fits with 1 column of independent variable), and zerrors (for the general case) are all equivalent to errors z, indicating that there is a single extra column with errors of the dependent variable.

However, the behavior is not the same for the different ones, and it seems to affect also range limitings (or the latter might be an independent thing, even not a bug).

Technical parameters

gnuplot 5.2.5
ubuntu 16.04 LTS
wxt terminal, interactive mode

Reproducible

As per my tests, always.
However, there is a chance that the actual data (and esp. the error estimates/weight calculation or the weighted fitting itself on the data) can return with unexpectedly bad fitting, while never without errors. (Even on the same data from the same initial values.) Mostly I would expect larger differences only in the asymptotic standard errors and at only a few cases in the parameters itself.

Details

My actual data

(This might be out of interest)
A data file contatining 64751 rows (important later) and 3 columns (x,z,delta_z) theyr x range is [0:1] and always has a subrange [xmin:xmax] for which z is in [0:1], and x and z are linearly related. delta_z is mostly in order of 0.1. The actual task is to fit a sigmoidal contrast function (see e.g. here) to those points for which the z data (i.e. $2 in the file) is in the range [0:1]. Filtering this way seems impossible (because the function to be fit is always within [0:1] by definition) unless I know the [xmin:xmax] subrange.

Behavior

After declaring the functio s(x) with parameters a and b, fitting is done against the data in the [xmin:xmax] subrange, which is set with set xrange before. As there are three columns in the file, using might be omitted (however, according to the manual, it is possibly a bad idea).

xrange ignorance

Also tested xrange handling by set xrange [xmin:xmax] ; fit s(x) ... and set xrange [0:1] ; fit [xmin:xmax] s(x) ... in all the following cases, and seems both ignore xrange always: the maximal slope (related to parameter b) is severely overestimated, the sigmoid's slope is much greater than the linear slope; when datafile contains only the points with x in [xmin:xmax], the slopes are nearly equal. The former is caused by that the points with y outside [0:1] shift the lower and upper tail toward 0 and 1 respectively.

When I left out errors and using 1:2 only, this ignorance disappear, but something is still strange: FIT_NDF is the same in both case (with and without errors), and after set autoscale xy ; replot (making yrange [-0.2:2.6] in my case), fit seems to reset yrange [0:1] which is actually the range of the function values. Thus, it seems to be unpredictable if data is filtered against this yrange or not: the FIT_NDF suggest correct filtering but the results differs so much that is impossible to believe is cause by just the errors.

errors z

fit s(x) 'file' using 1:2:3 errors z via a,b
This form forks only with using specified. Without using, it stops returning the error message Out of memory in fit: too many datapoints (4096)?.

zerrors and yerrors

fit s(x) 'file' using 1:2:3 zerrors via a,b
This form works with and without using, and the results are the same (and the same as the previous wiht using). The yerrors and zerrors produce the same, so they seem to be really equivalent.

Some notes on results

Fitting without errors on the whole dataset gives

a=0.302249+-0.0001197(0.03962%)
b=8.81983+-0.009261(0.105%)
FIT_NDF=33247 !!!
FIT_STDFIT=0.026446

so it seems that filtering happens even with unspecified ranges, or after set au xy ; rep. FIT_NDF begins to decrease only if I specify an xrange smaller than [xmin:xmax].
Fitting with errors gives

a=0.256654+-0.0003717(0.1448%)
b=21.6521+-0.06377(0.2945%) !!!
FIT_NDF=33247
FIT_STDFIT=468.141

thus, seemingly a much worse fitting, as can be seen on the attachments (file names tell you the case). The initial valuse were the same: a=0.32 and b=6, but testing with largely varying initial values (a from 0.2 to 0.4, b from 3 to 20) returns these same results.

2 Attachments

Discussion


Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.