Re: interpretation of fit errors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 05/06/2011 04:10 PM, Thomas Mattison wrote:
> I stand corrected about two things:
>
> 1.  Gnuplot fits have always done the "right" thing and divided
> chisquare by degrees of freedom rather than measurements.  Sorry, I
> haven't looked at what gets printed out in a while; I mostly use my
> own fitting programs these days.
>
> 2.  Gnuplot will indeed refuse to fit a constant function to a single
> data point, or a line to two points.  Yes, I hadn't done the
> experiment.  Sorry.
>
> However I regard item 2 as a bug not a feature.  It is perfectly well
> defined to ask the question, what is the chisquare for agreement
> between data and model, as a function of the parameter values, for
> these cases.  There is a perfectly well defined answer to that
> question.  There is a perfectly well defined point of minimum
> chisquare, which gives the best fit parameters.  And the curvature of
> the chisquare curve or surface gives the fit errors.  Nothing
> anomalous happens in the case where the number of data points equals
> the number of parameters.

It seems like it should.  From H.B.B.'s last post, if there are only two 
points and the curve to fit is a line (two parameters, first order 
equation**) then the fit is exact in the sense that the residuals are 
zero...if no other information is given about the data.  But that 
doesn't say anything about the measurement "error" in the fit.  (I sort 
of prefer the word "uncertainty", but that's just me.)  We can only 
glean information about the uncertainty of the fit (i.e., the statistics 
of the original data) if there are additional data points beyond the 
number of parameters.

** Polynomial order of the fitting curve is important.  For example, 
exp() has infinite polynomial order, which has ramifications on uniqueness.

> I agree that in those two particular cases, it's not necessary to "do
> a fit" to find the parameters; it can easily be done by hand.  But
> mathematically the fit is still well-defined.

OK, so I'm beginning to see there is this vital piece of information 
about the quality of fit, reflecting the underlying statistics or the 
uncertainty in the data (i.e., how noisy it is).  That's what is meant 
by error, right?

In this discussion

http://www.erikburd.org/projects/pitfall/evaluate.html

is a column for "standard error".  Is that what is being referred to, 
the standard error typically used in statistical analysis?  (There are 
the underlying actual measurement errors, but we can't know what those 
are because there is uncertainty in the fit.)

> I hope we can all agree that when fitting a line to two data points
> that have errors, the slope and intercept DO have errors.

Yes, but I don't see how one can estimate the size of that error without 
additional data beyond two data points, unless some information is known 
apriori.

> Is it really so much to ask to have both raw and rescaled errors
> printed?

I'm fine with that idea.  But I'm surprised that no well defined, 
commonly understood terminology has developed for what seems like an 
important idea in the numerical analysis community.  Bastian points out 
that:

* Minuit does not scale
* Origin offers both options, the default being version dependent
* SAS and Mathematica default to scaling of errors

That inconsistency is the problem.

Dan

Re: interpretation of fit errors

A portable, multi-platform, command-line driven graphing utility

Re: interpretation of fit errors