Re: interpretation of fit errors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On 05.05.2011 10:52, Daniel J Sebald wrote:

> The case of "known data errors", what is that?  Does Paul mean known
> statistics for the data errors (i.e., the case where an extra column is
> supplied to the input)?

Not quite.  The question is not whether data errors are present.  To 
some extent they always are --- if none were supplied explicitly, 'fit' 
defaults to a constant 1.0 for all of them.

The question is what those values _mean_, and what to do if the 
assumption about their meaning breaks down.

The meaning alluded to by people complaing about gnuplot's behaviour in 
this regard is that the input errors are actual, precise standard 
deviations of (presumably Gaussian distributed) input variables.

Now if this meaning were strictly true, and the fit generally valid, the 
chisq should end up being about equal to the number of degrees of 
freedom.  I.e. STDFIT should end up so close to 1.0 the the division 
performed by gnuplot would not make any notable difference.

The problems start when this plan fails, i.e. you're facing a fit that 
yielded a STDFIT far away from 1.0.  In effect this means that either 
the input errors were wrong, or the model function doesn't actually 
describe the given data at all.  For lack of omniscience, gnuplot has no 
choice but to assume the former, i.e. it decides that those data errors 
are not as reliable as they're made out to be.

Let's say you end up with a STDFIT of about 10.  That means the actual 
deviations between the fitted model and the data are on average 10 times 
as big as the data errors said they should be.  That fit has, in other 
words, missed its goal by a factor of 10 --- you've not even come close 
to threading that function through those error bars.

So what gnuplot does to resolve this conflict is to re-scale the input 
errors by the same factor of 10 they're apparently wrong by.  This 
factor ends up as a factor of 10 increase of the fitted parameters' errors.

In the end effect this means gnuplot treats the data errors as 
_weights_, not as strictly reliable errors.

gnuplot has been working like that since effectively forever.

> As for the original bug report, unless this is something obvious,
> perhaps there is a way to illustrate the error with a test case, to
> ensure the fit is solved correctly.

The demo is dead simple.  Pick any fit from the demos or wherever, and 
repeat it with the data errors multiplied by a fixed factor, i.e. replace

	fit f(x) 'foo.dat' u 1:2:3 via ...

by

	fit f(x) 'foo.dat' u 1:2:($3*20) via ...

'fit' will report the same data errors, both in the printed output and 
in the saved *_err variables.  Only the chisq and STDFIT will have 
shrinked by a factor of 20.

People thinking I made a bad decision here say that the errors on the 
parameters should become 20 times as large in the second case.

Re: interpretation of fit errors

A portable, multi-platform, command-line driven graphing utility

Re: interpretation of fit errors