#654 Add option covariancevariables to set fit and minor fit changes

None
pending-accepted
fit (9)
5
2014-03-10
2014-03-04
No

The attached patch adds the option covariancevariables to set fit. If enabled, this
leads to the generation of user-defined variables which contain the entries of the
covariance matrix. For a fit with the parameters "a" and "b" this leads to the
following four variables: "cov_a_a"="a_err"^2, "cov_a_b" = "cov_b_a", and
"cov_b_b"="b_err"^2.

Furthermore this patch changes the calculation of the FIT_STDFIT variable in case
errorscaling is switched off and z-errors (sigma_i) are provided. In the help file this
variable is defined as "calculated standard deviation of the fit", but this
is only equal to sqrt(chisq/ndf), if the weights are one. Otherwise this is given
by sqrt(chisq/ndf*<sigma_i^2>), where <sigma_i^2> is the weighted average of the individual variances: <sigma_i^2> = 1/(\sum(1/sigma_i^2)/N). This formula is now implemented.

The chi-square and reduced chi-square values are now provided as user-defined variables
FIT_CHISQ and FIT_REDCHISQ.

1 Attachments

Discussion

    • labels: --> fit
    • Group: -->
     
  • The print_function_definitions addition is in CVS. Since recursion is actually allowed, I changed your code in order to just stop and print a message after a certain number of recursion levels instead of bailing out. Also, I removed duplicates from the output and limited the number of printed functions definitions to a reasonable (32) number.

    Please find attached a revised version of your covar patch which applies to current CVS. Instead of cluttering the user variable name space, results are now prefixed with "FIT_COVAR_". All variables named "FIT_COVAR_*" are removed at the start of a new fit command.

    Could you please detail your envisaged usage? Since we already print the correlation matrix, wouldn't it be better to save this instead of the covariance matrix?

    Btw. please note that most of gnuplot's source uses a tab size of 8 and indentation of 4.

     
    • status: open --> pending
    • assigned_to: Bastian Märkisch
     
  • Could you please detail your envisaged usage? Since we already print the correlation
    matrix, wouldn't it be better to save this instead of the covariance matrix?

    I use the code for the calculation of error propagation in cases where no fit function
    could be found that has small correlation of the fit parameters. In this usage the
    covariances are needed and I found it easier to provide them directly instead of looking
    up the formula to calculate it from the correlation matrix.

     
    • Ethan Merritt
      Ethan Merritt
      2014-03-09

      I agree with this. I don't often use gnuplot's fitting code because I have field-specific optimization programs that normally serve my needs. But in general what I want out the back end of optimization is the covariance matrix rather than correlations.

       
  • Thanks for the feedback. The covar code is in CVS now. Your remaining changes are contained in the attached patch, to which I have a few comments:

    • removal of the backup file in update: I might be wrong, but I think the idea was to preserve the original file in any case, not just in case of error.

    • ensure errors in data are larger than zero: There already is a code block further down which not only prints an error message, but also prints useful information about the current datapoint. It is marked by "/ tsm patchset 230: check for zero error values /". Right now it only checks that z-errors are unequal zero, but that could easily be extended to "larger than". Now that gnuplot supports errors in the independent variables, we have to take care that some errors are actually allowed to be = 0 if more than one error column is given.

    • "data read from" line in fit.log: I think the inclusion of the using statement there was intentional and is actually useful.

     
    Attachments
    • status: pending --> pending-accepted
     
  • removal of the backup file in update: I might be wrong, but I think the idea was to
    preserve the original file in any case, not just in case of error.

    The problem of the old approach is, that it is not possible to run a script, which fits
    data and updates the parameter file, multiple times, since at the third time, the
    backup file already exists and the code errors out. One solution would be to generate
    a new file name in this case and the other (which I implemented) is to simply delete the
    backup file.

     
    • In the attached patch I added a new overwrite option to the update command,
      so that
      update 'test.par' overwrite
      will overwrite the backup file 'test.par.old', if it already exists.