From: Hans-Bernhard B. <HBB...@t-...> - 2011-05-05 23:30:59
|
On 05.05.2011 10:52, Daniel J Sebald wrote: > The case of "known data errors", what is that? Does Paul mean known > statistics for the data errors (i.e., the case where an extra column is > supplied to the input)? Not quite. The question is not whether data errors are present. To some extent they always are --- if none were supplied explicitly, 'fit' defaults to a constant 1.0 for all of them. The question is what those values _mean_, and what to do if the assumption about their meaning breaks down. The meaning alluded to by people complaing about gnuplot's behaviour in this regard is that the input errors are actual, precise standard deviations of (presumably Gaussian distributed) input variables. Now if this meaning were strictly true, and the fit generally valid, the chisq should end up being about equal to the number of degrees of freedom. I.e. STDFIT should end up so close to 1.0 the the division performed by gnuplot would not make any notable difference. The problems start when this plan fails, i.e. you're facing a fit that yielded a STDFIT far away from 1.0. In effect this means that either the input errors were wrong, or the model function doesn't actually describe the given data at all. For lack of omniscience, gnuplot has no choice but to assume the former, i.e. it decides that those data errors are not as reliable as they're made out to be. Let's say you end up with a STDFIT of about 10. That means the actual deviations between the fitted model and the data are on average 10 times as big as the data errors said they should be. That fit has, in other words, missed its goal by a factor of 10 --- you've not even come close to threading that function through those error bars. So what gnuplot does to resolve this conflict is to re-scale the input errors by the same factor of 10 they're apparently wrong by. This factor ends up as a factor of 10 increase of the fitted parameters' errors. In the end effect this means gnuplot treats the data errors as _weights_, not as strictly reliable errors. gnuplot has been working like that since effectively forever. > As for the original bug report, unless this is something obvious, > perhaps there is a way to illustrate the error with a test case, to > ensure the fit is solved correctly. The demo is dead simple. Pick any fit from the demos or wherever, and repeat it with the data errors multiplied by a fixed factor, i.e. replace fit f(x) 'foo.dat' u 1:2:3 via ... by fit f(x) 'foo.dat' u 1:2:($3*20) via ... 'fit' will report the same data errors, both in the printed output and in the saved *_err variables. Only the chisq and STDFIT will have shrinked by a factor of 20. People thinking I made a bad decision here say that the errors on the parameters should become 20 times as large in the second case. |