Hi
I have made a bunch of changes to gnuplot fitting in fit.c.
Generally, they are flagged by comments containing "tsm feb06".
In many cases, old code is still there for reference but commented out.
This should probably be cleaned up before a release.
Some of the changes have user-variables to control them.
I have not yet figured out how to commit the changes,
so they aren't out there yet.
1. Error message for undefined function evaluation
The old message in call_gnuplot() just said "Undefined value during
function
evaluation." I added the data point number, x, y, z, and parameter
values.
2. Check for error value equal to zero
If the user supplies errors but any of them are zero, there will be a
divide by
zero, trashing the fit. I added a check in fit_command(), which prints
the data
point number, x, y, z, and error, then aborts back to command line.
3. Made new message when fit stops because lambda is too large
Previously, the message in regress() said that the fit had converged in
this case
which wasn't always true. Other output unchanged.
4. Zero-change in chisquare is now "BETTER" rather than "WORSE"
Previously, if the exact minimum of chisquare was found, so the
chisquare change
was zero, this was called "WORSE" in marquardt(). The loop in
regress() didn't exit,
and further iterations were done, which were also "WORSE" and increased
lambda
until the upper limit on lambda or iterations was reached. This was a
waste of
time, and potentially confusing, for no benefit.
5. Final parameter cleanup fixes
The gnuplot internal variables for the fit parameters may contain values
from an iteration that made the chisquare worse rather than better, if
the loop in regress() terminates due to maximum iterations or maximum
lambda
rather than convergence. There was code at the end of regress() that
restores
the internal variable for only the last parameter, for a different
reason
(the last variable is altered in call_gnuplot() to calculate
derivatives,
but not restored there). I changed the code at the end of regress() to
restore
_all_ internal variables, from the array that contains the parameters
that gave
the best chisquare so far. I also put a copy of the last-parameter-fix
code
into call_gnuplot(), where it logically belonged anyway. Without this,
if the
user interrupted the fit and tried to plot the current function on top
of the data,
the last internal parameter was not correct.
6. Changed convergence criterion, with new user-variable FIT_LIMIT_ABS.
The new simpler criterion in regress() is absolute reduction in
chisquare
for an iteration of less than epsilon*chisquare plus epsilonAbs.
Default
for new internal variable epsilonAbs is zero, user-settable by
FIT_LIMIT_ABS.
Default for the existing internal variable epsilon is 1.0e-4,
user-settable
by FIT_LIMIT (unchanged). In most cases, the old convergence criterion
was
_relative_ change in chisquare of less than epsilon. But if the
chisquare
was less than NEARLY_ZERO = 1.0e-30, (which could happen if fitting data
with magnitude less than 1.0e-15 with no errors), the old criterion was
_absolute_ change in chisquare less than epsilon. Any change in
chisquare
would probably be less than epsilon in these cases, so the fitter would
announce convergence immediately rather than finding the minimum. Users
would probably prefer the default relative convergence criterion in
these cases,
but there was no way for them to impose it. All they could do was to
adjust
FIT_LIMIT, which would only let them adjust the _absolute_ convergence
criterion,
which would require them to guess what their minimum really was. With
the
new criterion, the default convergence criterion is always relative no
matter
what the chisquare is, but users now have the flexibility of adding an
absolute
convergence criterion through FIT_LIMIT_ABS. The new scheme defaults to
doing the same thing as the old scheme in almost all cases, is more
useful
in some (rare) cases, is easier to understand, and is more flexible.
The load-file newfit0.dem tries to fit a line to the file newfit0.dat.
With old gnuplot, the fit gives up when the parameters are still pretty
bad,
with the new convergence criteria the fit continues until it is right.
7. New one-line progress-report, revert by FIT_CLASSIC_PROGRESS = 1
The old report was many lines per iteration, so you could not hold many
iterations in the scroll buffer, and it was hard to see how chisquare,
parameters, and lambda were changing. It also said nothing about
iterations
where the chisquare increased (except to print an asterisk). It printed
"WSSR" with no explanation that this means weighted sum of squared
residuals,
rather than the more common term "chisquare". The new report is one
line
per iteration, with everything in neat columns so it's easy to track
the progress.
The first column is the chisquare, the second is the change in
chisquare divided
by the convergence limit, so convergence means a value between -1 and
0, the
lambda value, and one column per parameter. It also shows iterations
where
the chisquare increases. Implementation is through new function
show_fit1(),
called by regress() [and now also by marquardt()]. Both console output
and
fit.log file are affected by the change. The user can revert to the old
progress report show_fit() by setting FIT_CLASSIC_PROGRESS = 1.
8. New final fit parameter report format, revert by FIT_CLASSIC_RESULT
= 1
The old default final report label "final sum of squares of residuals"
was misleading because if the user supplied errors, the number printed
was the chisquare (_weighted_ sum of squared residuals or WSSR). The new
report labels are "chi-squared, "degrees of freedom," and "chisq/ndf"
when the user supplies errors. If the user did not supply errors, it
prints
"Sum of squared residuals" and "degrees of freedom", and not SSR/ndf
because
it is not particularly meaningful in that case. In both cases, it
prints
an error-rescaling factor (square root of internal chisquare per degree
of freedom,
calculated with error=1 if errors are not supplied). A new internal
variable
errColumn is set to 1 in fit_command() if the user supplied errors,
otherwise
it is zero. If the user supplied errors, both the raw and rescaled
parameter
errors are printed. If the user did not supply errors, only the
rescaled
parameter errors are printed. In both cases, the same thing goes to
the fit.log
file. The old printing code was moved into new function
PrintResults(), and
the new printing code is in PrintResults2(). A new internal variable
classicResults
controls which function is called by regress(). It defaults to 0 (new
results)
but setting FIT_CLASSIC_RESULT = 1 restores the old results report.
The load-file newfit1.dem fits lines to the files newfit1.dat and
newfit1a.dat,
switching back and forth between the old and new progress and results
reports.
9. Error-rescaling control
The old default was to always rescale parameter errors by the square
root
of the "chisquare" per degree of freedom (unless there were zero degrees
of freedom, in which case errors were considered undefined). While
this is
the only sensible thing to do if the user does not supply errors to the
fit,
it is not the only sensible thing to do if errors are supplied. The
parameter
errors are calculated in regress() as before, but now they are only
rescaled
by the square root of chisquare per degree of freedom if new internal
variable
rescaleErrs is true (non-zero). The rescaleErrs variable is set in
fit_command()
along with errColumn, but at the moment it is always set to 1 whether
the user
supplied errors or not. This means that the internal variables for
parameter errors
(not default, but available through recompiling with a preprocessor
option) and
the (non-default) old-style result report always give only rescaled
errors, as before,
unless someone changes the source and recompiles. The new default
results report
from FitResults2() checks rescaleErrs and errColumn so it can print the
right values
(both raw and rescaled if the user supplied errors, only rescaled if
the user did
not supply errors) no matter how rescaleErrs is set.
10. Derivative-step algorithm improvement, revert by
FIT_CLASSIC_DRV_STEP = 1
The fit needs derivatives with respect to parameters of the prediction
at each
data point. These are calculated numerically by changing the parameters
by a small
amount. The old step size was DELTA=0.001 times the parameter value
(or 1e-33
if the parameter is less than 1e-30). In some cases, this step is so
large that
non-linearities in the function make the derivative inaccurate. This
slows or
prevents convergence, and may make the location of the minimum wrong.
In other
cases, the step is so small that the predicted function value does not
change
to machine precision, so the calculated derivative is zero. This
usually causes
a singular matrix error. There is an optimal step size that balances
roundoff error
and non-linearities (see the discussion of numerical derivatives in
Numerical
Recipes). It requires an estimate of the roundoff error of the
function, and an
estimate of the second derivative. The function that we ultimately
care about
is the chisquare. We can estimate the roundoff error from the data
values, and
the matrices that the fit accumulates anyway can be used to estimate
the second
derivative. The calculation adapts to the data, function, and
parameter values.
In most cases, the step size is much less than 0.001 times the
parameter value,
although it is larger when it needs to be to avoid roundoff errors.
The new
algorithm is implemented by allocating (and freeing) in marquardt() a
new array
drvStepDA, initialized by calling new InitDrvSteps(), updated during
convergence
by calling new CalcDrvSteps()and changing calculate() to use the array.
The new algorithm is the default, but the old algorithm (re-implemented
in the
new functions) can be restored by setting FIT_CLASSIC_DRV_STEP = 1.
The load file newfit2.dem shows a case where the new derivative step
algorithm
works but the old algorithm step is too small, resulting in a singular
matrix error.
The load file newfit3.dem shows a case where the old derivative step
is too large so the fit does not really converge, while the new
algorithm is OK.
11. Improved treatment of parameters near zero
Previously, an initial parameter value less than NEARLY_ZERO = 1.0e-30
was silently changed to 1.0E-30. But in some cases it is natural for
parameters to be this small or smaller. Rather than silently overriding
the user's input when near zero, the new version uses any initial value
with no change. The exception is that an initial value of EXACTLY zero
is disallowed. The reason is that the numerical derivative algorithm
needs at least the order of magnitude of the parameter for
initialization.
There is a message telling the user to provide a non-zero
value of the right magnitude, or the magnitude of the expected error
if the expected value is zero. The old code was removed from
fit_command(),
the new code is CheckParms(), called at the start of regress().
The old behavior of creating a new variable with initial value 1.0
when a fit parameter doesn't yet exist is retained, but a print
statement
was added to createdvar() advising the user that it is better to provide
a non-zero explicit value.
12. Parameter step size limit, controlled by FIT_MAX_PAR_STEP
For non-linear functions, particularly when the initial parameter
values are far
from optimum, the fit may try to change the parameters to values where
the function
is undefined, which terminates the fit with an error (at least in the
new code,
the user gets more information about the parameter values and data
values that
caused the undefined result!). The only thing the user could do was to
change the starting point and pray. A new function LimitParSteps()
called by
marquardt() to scale down the steps in all parameters so the ratio of
any parameter
change to its value is no larger than internal variable maxParStep.
The new user
variable FIT_MAX_PAR_STEP controls maxParStep. The default value of
maxParStep
is 1.5, which allows a factor of 2.5 increase per iteration, and also
for the sign
to change. Note that maxParStep values of less than 1.0 would not
allow the
parameter to change sign, which could either be useful, or dangerous.
13. Scale-independence through multiplicative lambda, revert by
FIT_CLASSIC_LAMBDA
The lambda parameter in the Levenberg-Marquardt algorithm reduces the
step size
if the calculated step actually increases the chisquare. The amount of
reduction
for the different parameters depends on details of the implementation.
One common
implementation is "multiplicative", which is dimensionless and is
insensitive to
parameter scale differences. The implementation in gnuplot is
"additive" which
makes the performance sensitive to the relative scale of parameters and
errors.
Additive lambda requires a somewhat arbitrary initial value
calculation, while
multiplicative lambda can be simply initialized to 0.01. The
interpretation of
multiplicative lambda is easy (much larger than 1 means the step sizes
are reduced,
much less than 1 means the step sizes are "ideal"), while it is hard to
interpret
the value of additive lambda. My preference is for therefore for
multiplicative
lambda, although there are certainly combinations of data, function,
and starting
point where additive lambda will work and multiplicative lambda will
fail. The
implementation is through changes to marquardt() in both initialization
and usage
of lambda. The default is now multiplicative lambda initialized to
0.01, but
additive lambda is still available by setting FIT_CLASSIC_LAMBDA = 1.
Users can
set FIT_START_LAMBDA to change the initial lambda value in either case.
But for
additive lambda, supplying a FIT_START_LAMBDA value overrides the
initial value
calculation, and there is no way to turn it back on without restarting
gnuplot
(this has always been true).
The load file newfit4.dem shows a case where a simple line fit gives
the wrong
answer with additive lambda (because it thinks it is converged when it
has not),
and multiplicative lambda gives the right answer.
14. "Skeptical" chisquare for nonlinear fits, controlled by
FIT_SKEPTICAL
In a nonlinear fit, the influence of a given data point on the fit
parameters
(which depends on the parameter derivatives at the data point) can be
highly
sensitive to the values of the parameters. This can result in slow
convergence,
unless the initial parameters are chosen very close to the (unknown)
desired minimum.
An example is a function like a*exp(b*x), where the derivatives at
large x can be
orders of magnitude larger than at small x. This interferes with
convergence,
because Levenberg-Marquardt refuses to take a step if the chisquare
goes up,
so if it gets close to the data at high x, even tiny parameter changes
make the
chisquare worse (unless they are balanced in a nonlinear way). These
problems
can be reduced in some cases by adjusting the weights of data points to
cancel
out the dependence of the derivatives on the parameters. I call this
"skeptical chisquare" because it doesn't take the initial parameter
values
as seriously. In some cases, the convergence time goes from thousands
of
iterations to only a few. However, the location of the minimum of the
skeptical
chisquare will be close to the minimum of the normal chisquare if the
functional
form can reproduce the data well, but it will not necessarily be very
close if
the function does not reproduce the data, at least over the region
being fit.
Also, the errors calculated with the skeptical chisquare are not very
meaningful.
For these reasons, a skeptical chisquare fit should only be used to
determine better
initial values for a normal chisquare fit. There are if-statements in
marquardt()
that test new variable skepticalChisq, and re-scale the internal errors.
The default is 0 for normal chisquare. The user can turn it on by
FIT_SKEPTICAL = 1,
and off by FIT_SKEPTICAL = 0.
The load file newfit5.dem gives an example with a*exp(b*x) where
"convergence"
takes thousands of iterations with normal chisquare, even starting with
both
parameters close to right, but only a few with skeptical chisquare, and
the
final refit with normal chisquare takes only a few iterations.
15. Monte Carlo search for initial fit parameters
There are many problems where fits only converge when the initial
parameters are
rather close to the solution, and it can be tedious finding such a
starting point.
To make this easier, I have implemented a Monte Carlo search for the
minimum chisquare
over a user-defined range, which is used as the initial parameters for
the normal
Levenberg-Marquardt fit. The range information is input using the "via
file"
mechanism. The "via file" parser in fit_command() was modified to
allow (but not
require) two values per line. If any lines have two values, a Monte
Carlo search
for starting parameter values is done. Parameters with only a single
value are held
constant for the Monte Carlo search, but still vary in the final fit.
The existing
"# FIXED" syntax for non-parameter constants was not changed. The
default Monte Carlo
search is 1000 iterations. The user can change this by setting
FIT_MONTE_CARLO_TRIALS.
The default behavior is to continue with a regular fit from the best
chisquare point.
But if FIT_MONTE_CARLO_TRIALS is negative, after (minus) that many
trials, control goes
back to the user without a fit, but with the internal parameter values
set to the
best trial. This lets the user plot the function with these values as
a diagnostic.
The random number generator is not re-seeded, so repeating the search
with the same number
of iterations on the same function and data always gives the same
result.
The load file newfit6.dem shows sine-wave data where fits fail unless
the frequency
parameter is quite close to the right value. The Monte Carlo search
does quite a good
job in 1000 iterations, which takes only seconds, and the fit converges
well from this
starting point.
I compared the new fitting code results with the old version on fit.dem.
The first fit starts with both parameters initialized to zero, which is
now
illegal, so I changed fit.dem to initialize them to 0.001. When the
switches
are set to revert to the old algorithm, the results are nearly
identical.
The first fit takes a few extra iterations because of maxParStep.
With the default changes to the derivative and lambda algorithms, the
results are noticably different on the hemisphere fit. I traced this to
the fact that the function has a square root, and for some of the data
points, the argument is negative. The fit only uses the real part of
the function, so it is zero for those data points. So the chisquare
actually has multiple minima, depending on how many data points give
negative square roots. The modified fit.dem now makes a plot of the
argument of the square root, and also defines a new fit function with
a Taylor expansion of the square root that is defined for all arguments.
With the modified fit function, the new and old versions give the same
result (and both converge in far fewer iterations, implying that most
of the iterations were being confused by the multiple minima).
The new version generally tends to converge in somewhat fewer
iterations,
although for some of the cases in fit.dem, the new version takes more
iterations.
Cheers
Prof. Thomas Mattison, Dept. of Physics & Astronomy, Univ. of British
Columbia
Present Address: Stanford Linear Accelerator Center
2575 Sand Hill Road, Menlo Park, CA, 94025
Building 48 (Research Office Building), Mail Station MS35
Office: ROB-231 Phone: 650-926-5342 Fax: 650-926-8522
|