From: Hans-Bernhard B. <br...@ph...> - 2005-12-01 12:37:37
|
Thomas Mattison wrote: > The major annoyance is that the errors from fits are not done in what I > consider to be the correct way. There exists a definition for the error > of a fit that is independent of the goodness of the fit, although of > course it depends on the function, the distribution of the data, and the > error bars on the data. Gnuplot fits do not report this result. I know. I made it that way, at least partly on purpose. The basic problem is that once the chisq/ndf is far away from one, discussing correctness of parameter errors is a waste of energy. In such a case whole fit is plain and simply wrong, so no parameter error value really has a right of calling itself correct. chisq/ndf far away from one means that either the data errors are unrealistic, or the model is wrong (over-/underfitting), or both. In this situation, it's a quite completely arbitrary choice whether to believe the input errors, or the residuals. gnuplot chooses to believe the residuals. > If the function fits the data perfectly at every point, the error > returned by gnuplot is zero. This is clearly nonsense. No --- the fit itself is clearly nonsense. I have to insist that outputting nonsense as the result is fully justified in such a case. > My proposed solution would be to report both versions of the error: the > conventional error that is independent of the goodness of fit, and the > rescaled error that gnuplot presently reports. I would also explicitly > report the factor by which the data errors have been effectively > rescaled. That factor already is reported. It's sqrt(WSSR/ndf). I don't really think that there's much value in outputting two numbers as seemingly independent results where in reality there's just a multiplication by common factor needed to go from one to the other. > A feature of this fix is that it would change the format of the > parameter and error report from the fits. I don't have as good an idea > as you folks do about whether users would care about format changes. Probably not as much now as they would have before version 4.0, when we introduced 'set fit errorvariables' which lets users get at the parameter errors directly from inside gnuplot, without parsing the fit.log or screen output to extract them. > I would add a feature: create a parameter-log file that would contain > gnuplot-readable fit summary information: chisquare, parm-A, > normal-errorA, rescaled-error-A, parm-B, regular-errorB, > rescaled-error-B, ... Each fit would append to the end of the file, > similar to the present fit log file. I would precede each line with a > copy of the fit command that produced the fit (with a # in front so > gnuplot would consider it to be a comment). Probably I would also have > a commented line giving the time of the fit. It would also be possible > to create a comment line containing headers to show which column means > what for the fit, using the user's names for the fit variables. I think it would make a lot more sense to add a couple lines to 'fit.log' instead of creating what would be an almost complete copy of all of its content. > I also have some minor annoyances that I think are worth fixing. I > would change the default FIT_START_LAMBDA to be 0.01 as recommended by > Numerical Recipes rather than the method used now (I tell my students > to do this and it frequently helps). Caution there --- the actual algorithm is not the same as that in NR (although it used to be), so not all recommendations issued by that book may apply to gnuplot unmodified. I took the computation for the initial lambda from a textbook on numerical maths (Schwarz "Numerische Mathematik", Teubner Verlag, in German). I.e. the default startup for lambda is not any particular fixed number. It's computed from the problem. > I would make the default value for uninitialized variables in fits to > be 0.01 rather than 1e-30 (the numerical derivatives algorithm tends > to break with such a tiny default). It's not 1e-30 right now --- it's 1.0 > I would fix the numerical derivatives algorithm so it would > not break if the initial value of a parameter is zero. > There is a more major feature addition that would be nice. Frequently, > data has errors not only on the y-variable, but also on the x-variable. > Gnuplot (nicely) handles plotting data with both x and y errors, but it > doesn't know how to fit such data. Neither do Mrs. Marquard and Levenberg, or anybody else I've heard about, for generic non-linear fitting. > The rigorously correct way involves > fitting for adjustments to all the x-variable values, as well as the > parameters. Rigorously, fitting doesn't even know what x values are. It's just a typical use case simplification to treat the model as a function y[i] = f(x[i], parameters) Internally, the algorithm only assumes y[i] = f[i](parameters) > But the next two lines don't do analogous things > > plot 'file' using 1:2:3:4 with xyerr > fit f(x,y) 'file' using 1:2:3:4 via a,b,c > > The plot command uses columns 3 and 4 for x and y errors, and the fit > command uses columns 3 and 4 for z and z errors. That's because that fit has no useful relation with the 'plot' command you're comparing it to. The type of command to compare it with would be splot 'file' using 1:2:3:4 with errorbars (which unfortunately still doesn't exist). > One solution would be to make the interpretation of the columns in a fit > command depend on the number of variables in the function. It won't work as easy as that --- parameters can also be passed as parameters, i.e. you can fit f(x,a,b,c) 'file' u 1:2:3 via a,b,c instead of fit f(x) 'file' u 1:2:3 via a,b,c so the argument count of the function is strictly a red herring. |