From: <hartl@on...>  20070303 08:21:38
Attachments:
fit_ASE.diff
gnuplot_doc_fit.diff

Apparently a formal bug report was not filed for the issues addressed by Thomas Mattison in patch 1445064, "Gnuplot fitting improvements" so the bugs he found were not considered after his patch was determined to be not ready to drop into the 4.2 release. One issue relative to the 4.2 release is that the estimates of parameter errors produced by fit are incorrectly identified. The fit discussion in gnuplot.doc (fit error statistical overview) and the fit output labels clearly identify the quantities as "asymptotic standard errors" (ASE) which are precisely defined and are the "errors" conventionally reported by other NLLS fitting programs. However the fit code scales the ASE values, I'll denote the result as SASE, and outputs the SASE values as ASE without indicating that they have been scaled, nor does gnuplot.doc mention the scaling or give any justification for the selected scaling factor. Possible fixes are 1) omit the scaling (comment out 2 lines) so that the values are as defined and indicate in the documentation the former reporting of scaled ASE. 2) change the output labels to SASE and document the basis for applying the scaling factor. 3) add a FIT control to let the user select ASE or SASE and appropriately label the output (as did Mattison) and extend the discussion as for (2). My preference would be (1), and leave it to the user determine the significance of the ASE for their application, rather than (2) or (3) for which one would want documented  a definition of what gnuplot is attempting to estimate by scaling the ASE,  the basis for the selected scaling factor,  the validity of the scale factor when extending its application beyond the one case in which the High Energy Physics Particle Data Group uses such a scaling factor (it is their convention for reporting an estimate of the error in an average of values from different experiments  a single parameter, linear least squares problem.) I have attached patches to fit.c and to gnuplot.doc for (1). (I did not include a reference for the PDG practice, described in the Introduction to the 'Review of Particle Physics', available online http://pdg.lbl.gov/2006/reviews/textrpp.pdf The section on confidence limits in the Statistics chaper http://pdg.lbl.gov/2006/reviews/statrpp.pdf is more relevant for the general case.) The fit.c patch also changes a label in the fit output 'final sum of squares of residuals :' which Mattison reported some found confusing to 'final sum of squares of (wtd) residuals :'' (The sum is of weighted residuals, with unit weights in the case of unweighted analyses.) Also the gih indexing of fit subtopics is flawed in both 4.1.0 and 4.2rc4. Subtopics available for fit: adjustable_parameters beginners_guide control error error_estimates errors guide multibranch parameters starting_values tips where error, error_estimates, and errors are synonyms. However selecting 'Subtopic of fit: error' gives the text for the level 4 subtopic 'practical guidelines' while 'error_estimates' or 'errors' give text for the level 3 'error estimates' The level 3 subtopic does not list the level 4 subtopics, 'practical guidelines' and 'statistical overview'. The gnuplot.doc fit patch also deletes the 'errors' subtopic, retains 'error' and 'error_estimates' as synonyms, and causes the two level 4 subtopics to be listed when either is selected, but someone may have some other preference for subtopic names.  Lucas Hart Oregon State University 
From: Ethan A Merritt <merritt@u.washington.edu>  20070303 18:24:09

On Saturday 03 March 2007 00:21, Lucas Hart wrote: > > Apparently a formal bug report was not filed for the issues addressed by > Thomas Mattison in patch 1445064, "Gnuplot fitting improvements" so the > bugs he found were not considered after his patch was determined to be > not ready to drop into the 4.2 release. OK. Then please file this on SourceForge as a bug report, rather than risk its getting lost again. As to the help message problems, I wonder if these aren't additional examples of a problem in the help processing code. Several other examples turned up recently, and there are a couple of patchsets outstanding that rework some of the help message code paths. Ethan > One issue relative to the 4.2 release is that the estimates of > parameter errors produced by fit are incorrectly identified. > > The fit discussion in gnuplot.doc (fit error statistical overview) and > the fit output labels clearly identify the quantities as "asymptotic > standard errors" (ASE) which are precisely defined and are the "errors" > conventionally reported by other NLLS fitting programs. > > However the fit code scales the ASE values, I'll denote the result > as SASE, and outputs the SASE values as ASE without indicating that > they have been scaled, nor does gnuplot.doc mention the scaling or > give any justification for the selected scaling factor. > > > Possible fixes are > 1) omit the scaling (comment out 2 lines) so that the values are > as defined and indicate in the documentation the former reporting > of scaled ASE. > 2) change the output labels to SASE and document the basis for > applying the scaling factor. > 3) add a FIT control to let the user select ASE or SASE and > appropriately label the output (as did Mattison) and extend the > discussion as for (2). > > > My preference would be (1), and leave it to the user determine the > significance of the ASE for their application, rather than (2) or (3) > for which one would want documented >  a definition of what gnuplot is attempting to estimate by scaling > the ASE, >  the basis for the selected scaling factor, >  the validity of the scale factor when extending its application > beyond the one case in which the High Energy Physics Particle Data > Group uses such a scaling factor (it is their convention for > reporting an estimate of the error in an average of values from > different experiments  a single parameter, linear least squares > problem.) > > > I have attached patches to fit.c and to gnuplot.doc for (1). > > (I did not include a reference for the PDG practice, described in > the Introduction to the 'Review of Particle Physics', available online > http://pdg.lbl.gov/2006/reviews/textrpp.pdf > The section on confidence limits in the Statistics chaper > http://pdg.lbl.gov/2006/reviews/statrpp.pdf > is more relevant for the general case.) > > The fit.c patch also changes a label in the fit output > 'final sum of squares of residuals :' > which Mattison reported some found confusing to > 'final sum of squares of (wtd) residuals :'' > > (The sum is of weighted residuals, with unit weights in the case > of unweighted analyses.) > > Also the gih indexing of fit subtopics is flawed in both 4.1.0 and 4.2rc4. > > Subtopics available for fit: > adjustable_parameters beginners_guide control > error error_estimates errors guide > multibranch parameters starting_values tips > > where error, error_estimates, and errors are synonyms. However selecting > 'Subtopic of fit: error' gives the text for the level 4 subtopic > 'practical guidelines' while 'error_estimates' or 'errors' give text > for the level 3 'error estimates' > > The level 3 subtopic does not list the level 4 subtopics, 'practical > guidelines' and 'statistical overview'. > > The gnuplot.doc fit patch also deletes the 'errors' subtopic, retains > 'error' and 'error_estimates' as synonyms, and causes the two level 4 > subtopics to be listed when either is selected, but someone may have > some other preference for subtopic names. > >  Lucas Hart > Oregon State University >  Ethan A Merritt Biomolecular Structure Center University of Washington, Seattle 981957742 