Re: interpretation of fit errors

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 06.05.2011 19:48, Thomas Mattison wrote:

> Consider the case where the "function" to be fit is just a constant:
> we think the the y-value of all data points should be the same,
> independent of x-value.  Let's call this constant C.  Surely we want
> gnuplot to be able to do this "fit" and give a meaningful error on
> C.

Yes.

> Then consider the special case where there is only a single data
> point: an x-value, a y-value, and a y-error.  The fit result should
> C=y, and the error should be sigma-C = sigma-y.

It shouldn't.  The problem here is that by going down to a single data 
point, you've constructed a problem with zero degrees of
freedom.  That's not a fit --- it's just an equation to be evaluated.

Using fit for that is tool abuse.

> The fit should be "perfect" with a chisquare of zero.  The chisquare
> divided by the number of measurements is still zero.

But chisq is never divided by the number of measurements.  It's divided 
by the number of degrees of freedom, i.e. (number of measurements) - 
(number of parameters).  In the case at hand that's zero, STDFIT would 
be zero divided by zero.

> Internally, gnuplot's fit algorithm knows a number equivalent to
> sigma-C = sigma-y,

No, it doesn't, because it never even starts to work in that case:

gnuplot> fit a '-' u 1:2:3 via a
input data ('e' ends) > 0 10 2
input data ('e' ends) > e
          Read 1 points
          No data to fit

> but the reported error is that multiplied by chisquare/measurement,
> so the result is zero.

No it's not.  See above.

> It is totally unreasonable to claim that "fitting" the single data
> point gives C with infinite precision, when the data point has a
> finite error.

It's totally unreasonable to use fit on a single data point, period.

> One can make a similar argument about the case of fitting a line (two
> parameters, slope and intercept) to two x-y-yerror data points.

Same problem.  Zero degrees of freedom means 'fit' is the wrong tool for 
the job.

> Gnuplot's fit will go exactly through both data points, with zero
> chisquare per measurement, so the reported errors on the slope and
> intercept will both be zero.

No they won't.  Because gnuplot won't report _any_ errors in that case.

> But clearly there is a range of slopes
> and range of intercepts that are "within one sigma" of both data
> points.  And internally gnuplot's fit algorithm knows those errors on
> slope and intercept, before it multiplies them by zero.

No, it doesn't.

> I would make one small additional change: chisquare should be divided
> by "degrees of freedom" == (measurements minus parameters) not just
> measurements.  Do this both for reporting STDFIT and rescaling of
> errors.

What made you believe that's not already what's happening?

> user that the rescaled errors are meaningless.  This is much more
> sensible than what gnuplot does now, which is report the errors are
> zero.

Your view of the world might profit from an actual experiment.

Re: interpretation of fit errors

A portable, multi-platform, command-line driven graphing utility

Re: interpretation of fit errors