From: Thomas M. <mat...@ph...> - 2006-03-10 01:23:31
|
On 8-Mar-06, at 10:34 AM, Hans-Bernhard Br=F6ker wrote: > > drand48() cannot be used like that. It's quite a non-portable=20 > function. Of the compilers I tried so far, only the most dedicated=20 > impersonators of UNIX (DJGPP and real Cygwin) have it. On 9-Mar-06, at 5:56 AM, Hans-Bernhard Br=F6ker wrote: > Thomas Mattison wrote: > >> Yeah, I wondered whether that would be sufficiently portable. >> Is there a random generator internal to gnuplot that I should use? > > Hmm, we do have specfun.c:ranf(). That should fit the bill. I looked at specfun.c. There's something declared static double ranf(struct value *init) but it's not visible outside specfun.c, and I didn't know what to do with the struct value* . There's also in specfun.h something declared void f_rand __PROTO((union argument *x)) but returning void doesn't help me much. > ["TM" is Thomas Mattison, HBB are my, Hans-Bernhard Broeker's,=20 > comments] > > 4. Zero-change in chisquare is now "BETTER" rather than "WORSE" > > TM: Previously, if the exact minimum of chisquare was found, so the > chisquare change was zero, this was called "WORSE" in > marquardt(). The loop in regress() didn't exit, and further > iterations were done, which were also "WORSE" and increased > lambda until the upper limit on lambda or iterations was > reached. This was a waste of time, and potentially confusing, > for no benefit. > > HBB: I'm not entirely sure about this one. There could be other > reasons for a failure to find an improvement besides having found a > well-defined optimum. An extended flat minimum, or a pair of local > minima, say. Giving up the Marquardt process immediately when the > Newton iteration admits defeat seems to defeat the purpose of doing > the Marquardt. I think it would be hard to construct a test case where an iteration produces EXACTLY zero change in chisquare, where the following=20 iterations succeeded in improving the chisquare. The only cases I can think of where the chisquare change would be EXACTLY zero are that the parameters are already at the minimum to machine precision, or the value of lambda is so large that the parameter steps are so small that the chisquare doesn't change. In the first case, it's OK if we stop iterating. In the second case, a result of WORSE will make lambda even bigger, so the next step will be even smaller, so it will also produce zero change in chisquare. But it is fairly easy to construct a case where the present code does something that would likely confuse a user. As the iteration approaches the minimum, the calculated parameter changes _should_ get smaller and smaller, and eventually get so small that the chisquare does not change to machine precision. Gnuplot's default convergence limit of 1e-5 relative change in chisquare is pretty loose, so convergence is=20 typically declared before this happens. But if you reduce FIT_LIMIT to much smaller values, you will find that in most cases, the fit gets to a=20 point where further iterations no longer change the chisquare. If this is=20 defined as WORSE, then the iteration would continue forever. Obviously there would be many complaints if this actually happened. The reason it doesn't happen is that there is _another_ way to exit that=20 loop. When the chisquare is WORSE, then lambda is increased, and the loop=20 exits on maximum lambda value. The existing code does NOT print a message=20 that the maximum lambda has been exceeded. What it prints is that the fit=20 converged. I know from experience teaching with gnuplot that lots of users make=20 mistakes that cause "convergence" to nonsense. Ideally, if gnuplot knows that=20 the fit hasn't converged properly, it should tell the user that. Having lambda=20= hit the maximum should be a sign that the fit has not converged. I have=20 added a print statement that tells the user the truth when the fit has=20 terminated because lambda hit the maximum. But in order for this to be useful, we=20= need to make sure that this _isn't_ printed for a fit that has actually=20 found the minimum. The way to do that is to make EXACTLY zero change in chisquare be BETTER instead of WORSE, so the loop exits then, rather than waiting for lambda to hit the limit. > 5. Final parameter cleanup fixes > > TM: The gnuplot internal variables for the fit parameters may contain > values from an iteration that made the chisquare worse rather > than better, if the loop in regress() terminates due to maximum > iterations or maximum lambda rather than convergence. There was > code at the end of regress() that restores the internal variable > for only the last parameter (for a different reason: the last > variable is altered in call_gnuplot() to calculate derivatives, > but not restored there). I changed the code at the end of > regress() to restore _all_ internal variables, from the array > that contains the parameters that gave the best chisquare so > far. > > HBB: this feels risky, at first sight. regress() keeps more state = than > just the parameters --- restoring the parameters could leave them out=20= > of > synch with the rest, e.g. the correlation matrix and chisquare. I looked again and I think it's OK. The parameters array a[] and the derivatives matrix C[] that is used for the error and correlation=20 calculation in regress() are calculated inside marquardt(). They are always in=20 synch with each other, and what regress() has is always from the best=20 chisquare that marquardt() has seen, and the chisq variable is that value. > TM: I also put a copy of the last-parameter-only fix code > into call_gnuplot(), where it logically belonged anyway. > > HBB: I don't think that's correct. call_gnuplot is not supposed to=20 > know how > fit's numeric derivation method works. It should just plug in the > parameter values and evaluate. You're right, because I made a mistake in my note, not the actual code. I actually put the last-parameter-only fix code into calculate(), not call_gnuplot(). calculate() is the function that perturbs the internal variables for the derivative calculation, not call_gnuplot. The change is just to undo the perturbation for the last parameter, just like is done for all the other parameters. > TM: Without this, if the user interrupted the fit and tried to > plot the current function on top of the data, the last > internal parameter was not correct. > > HBB: in that case, the fit interruption handler is what needs fixing, > not call_gnuplot That would be another way of fixing things, but is there any significant reason why calculate() should not undo the perturbation of the last parameter that it had just done? > > 6. Changed convergence criterion, with new user-variable=20 > FIT_LIMIT_ABS. > > TM: > [...] > less than epsilon. But if the chisquare was less than > NEARLY_ZERO =3D 1.0e-30, (which could happen if fitting data with > magnitude less than 1.0e-15 with no errors), the old criterion > was _absolute_ change in chisquare less than epsilon. Any > change in chisquare would probably be less than epsilon in > these cases, so the fitter would announce convergence > immediately rather than finding the minimum. Users would > probably prefer the default relative convergence criterion in > these cases, but there was no way for them to impose it. > > HBB: I have to disagree here. Of course users could impose a > change: they could supply the missing error values. Fake them by a > constant, if they must. I know I tend to be a bit harsh on users > in this regard: I think there is a threshold of how easy it should > be made to abuse a tool. The above change is still OK, but IMHO > it's getting a bit close to that threshold. You're right that there is _something_ that they could do. They could either supply sensible errors so they have a real chisquare which would not be 1e-30. Or they could re-scale their data even without errors so the "chisquare" came out in the range that works. But this looks like a case where it's easier to fix the code so it works even when somewhat abused, rather than explaining what constitutes=20 abuse, why it causes the problem, and how to get the results they want. We advertise gnuplot fits as working even if errors are not supplied, and this is a case where they don't work properly. It's easily fixed, and the change just simplifies both the code and the explanation of what the code does. So why not? > > 7. New one-line progress-report, revert by FIT_CLASSIC_PROGRESS =3D 1 > > TM: (except to print an asterisk). It printed "WSSR" with no > explanation that this means weighted sum of squared residuals, > rather than the more common term "chisquare". > > HBB: the meaning of all abbreviations is explained in the > documentation. There's no excuse for user not reading the documention > of a tool as complex as this. Considerable work has gone into that > documentation in the past. One result of much deliberation during = that > activity was was to call that quantity WSSR, instead of chisquare. > I'd rather not go through all that again. Ah, if they would only read documentation.... Is the discussion you mention recorded anyplace? I'd like to see the arguments. In my experience (undergrad physics as student and professor, grad school and beyond in experimental particle physics, sound's a lot like you actually), chisquare is by far the more common name. But it's easy to change my column label to WSSR if that's really more universally understood, or even to SSR for no-errors fits and WSSR for fits with errors. > TM: The new report > is one line per iteration, with everything in neat columns so > it's easy to track the progress. > > HBB: ... but only for fits with less sufficiently few parameters to > all fit in one line. Please try to keep in mind that not everybody > uses text consoles 200 or more characters wide... > > TM: The user can > revert to the old progress report show_fit() by setting > FIT_CLASSIC_PROGRESS =3D 1. > > HBB: Good --- because otherwise, all those users out there who have > scripts parsing fit.log and/or console output of gnuplot would cause > a mighty ruckus. I considered adding line breaks, but decided against it. They don't solve the readability problem for narrow consoles, compared to letting the text wrap. And they make the output less useful for people who do use wide consoles and/or capture the output for display in some other program that can scroll horizontally. And remember that the old multi- line output has a different problem: there is a limit to how many iterations will be visible on a terminal or scroll-buffer. To me, the old progress report was very hard to use as a progress report, and I think most people only looked at the final results. The new progress report is more likely to be useful as a progress report, even if it's not perfect for all users, which nothing ever is. > > 8. New final fit parameter report format, revert by=20 > FIT_CLASSIC_RESULT =3D 1 > > TM: The old default final report label "final sum of squares of > residuals" was misleading because if the user supplied errors, > the number printed was the chisquare (_weighted_ sum of > squared residuals or WSSR). The new report labels are > "chi-squared, "degrees of freedom," and "chisq/ndf" > > HBB: see above note about nomenclature and documentation. I'm open to debate about what to call things. But part of my point was that what we print means different things if the user supplies errors vs not supplying errors. So we should have different labels in the different cases. > 9. Error-rescaling control > > TM: user supplied errors or not. This means that the internal > variables for parameter errors (not default, but available > through recompiling with a preprocessor option) > > HBB: Assuming you're referring to GP_FIT_ERRVARS there: > that build option is supposed to be turned on by default, these days. > > TM: and the > (non-default) old-style result report always give only rescaled > errors, as before, unless someone changes the source and > recompiles. > > HBB: this could be quite bad. Changing the meaning of a=20 > machine-readable > entity like the parameter error variables is a recipe for utter=20 > confusion. > Please consider introducing a different set of error variables for the > new-style errors. There has to be a way for users' > scripts to get at the exact same values they used to get, or at least > to detect that the numbers now stored in the same places now mean=20 > something > different. I guess I wasn't clear. My code does _not_ (presently) change the=20 values calculated for the user-accessible variables created when GP_FIT_ERRVARS is turned on. The calculated errors are re-scaled using the actual fit "chisquare" like they were before. And if the user reverts to the old-style final result report, they will get the same rescaled error values as before. I only included a comment to make it obvious how to change the source code so the GP_FIT_ERRVARS error variables would only be rescaled by the chisquare when the fit is done without user errors, which I think is the right thing. Another solution would be to create another conditional-compile flag, but I didn't do it that way. I don't advocate doubling the number of fit-error variables created, I think that would be much more confusion than it's worth. I do advocate changing the default behavior. It's statistically wrong=20= to rescale the errors according to the chisquare, if the user provided=20 valid errors. If we repeat the same experiment and fit many times, the data=20= will have statistical fluctuations, the fit parameters will have statistical fluctuations, and the chisquare will have statistical fluctuations. =20 But it's not true that the fits that with lower chisquare are more closely=20 clustered around the true parameter values. They should not be assigned smaller=20= errors. Apparently some commercial fitting packages make this mistake also. For instance, see http://physics.gac.edu/~huber/fitting/aapt2001.ppt But it's still a mistake. For the new default final result report, both raw and rescaled errors are printed, and the code correctly prints both errors labelled properly whether the rescaling variable is set to old-style GP_FIT_ERRVARS=20 values, or to what I consider to be the right calculation. > 10. Gnuplot-readable parameters and errors in one line in fit.log = file > > HBB: I'm against this. fit.log is the wrong place for this data. > It's much more similar to what the 'update' command already does, so = if > this feature is added, it should go there, not into fit.log. The point is that I want to _append_ many fit results to a gnuplot-readable file, so the results from many similar fits could be conveniently plotted along with their errors. The file from the update command doesn't seem to be appropriate for appending results from many fits. Can you give a reason _why_ fit.log is the wrong place? Recall that in the exchange we had in the fall, I suggested making a new fit-summary output file that would contain _only_ lines like this, and your answer was > I think it would make a lot more sense to add a couple lines to=20 > 'fit.log' > instead of creating what would be an almost complete copy of all of=20 > its content. So would it be best to go back to my original proposal of a separate new file that contained only appended one-line summaries with errors? > 13. Parameter step size limit, controlled by FIT_MAX_PAR_STEP > > TM: function LimitParSteps() called by marquardt() to scale down the > steps in all parameters so the ratio of any parameter change to > its value is no larger than internal variable maxParStep. > > HBB: It's been a while since I rewrote this Marquardt-Levenberg > code, and I don't have my textbooks at hand, but: > this 'limit the step size' looks awfully like a duplicate of > what Marquardt's lambda parameter is designed to do. Even if it's not > an exact duplicate, this seems to almost beg for a fight between > those two mechanisms over who gets to decide how big the step sizes > should be... > Couldn't the same effect be had by a simple penalty on the chisquare > to steer the existing algorithm away from those regions? It's not a duplication of what lambda does, it addresses a different problem. On the first iteration of a nonlinear fit, particularly if the initial parameters are poor, the first parameter step may be to a region where the function is not even defined, which results in a failure. Lambda doesn't reliably prevent this, because it can take several iterations for lambda to increase far enough to limit step sizes by itself. For an example, try fitting my demo/newfit/newfit5.dat with a*exp(b*x) with starting parameter values a=3Db=3D1 with old = gnuplot. You will get "Undefined value during function evaluation". My modified version of gnuplot would print the extra information that a=3D229 and b=3D7869 for the next trial, which is why the = calculation of the function blows up. But with the default FIT_MAX_PAR_STEP =3D = 1.5, it increases the value of a and b slowly enough that the function is never undefined (although it still converges VERY slowly unless you use FIT_SKEPTICAL =3D 1) The two mechanisms cooperate nicely. The new mechanism doesn't=20 interfere with convergence once the steps are small. It just keeps things from running away before a reasonable value of lambda has been established by the iteration. The chisquare already penalizes the regions we are talking about, the problem is that sometimes first steps are so large that the function can't even be evaluated to calculate the chisquare! > 14. Scale-independence through multiplicative lambda, > revert by FIT_CLASSIC_LAMBDA =3D 1 > > TM: implementation is "multiplicative", which is dimensionless and is > insensitive to parameter scale differences. The implementation > in gnuplot is "additive" which makes the performance sensitive to > the relative scale of parameters and errors. > > HBB: I must admit you've lost me there. Could you pass me some > references about these two different lambda's? I don't have a textbook or article reference, other perhaps than Numerical Recipes (which uses multiplicative and doesn't mention the other possibility). Basically, you write the chisquare sum, take the derivatives with respect to the parameters, and set them to zero, to get N nonlinear equations in the N unknown parameters. Then you linearize those equations, resulting in an N by N matrix which I call the weight matrix, which is the inverse of the covariance matrix. The weight matrix isn't explicitly calculated in the gnuplot implementation, but it's C times its transpose. For a linear problem the parameter step is the inverse of the weight matrix times the vector of sum of residuals over errors times parameter-derivatives, and iteration is not needed. For a nonlinear problem, the resulting parameter step is not optimal, and may increase the chisquare. The Levenberg-Marquardt trick is to increase the diagonal of the weight matrix (the diagonal elements are always positive), so the elements of the inverse are smaller, and the parameter steps are smaller. If the diagonal is increased so far that the off-diagonal elements are negligible, then the parameter steps are in a direction=20 that is guaranteed to decrease the chisquare, but the steps will be small. Additive lambda means add the same number lambda to each element of the diagonal of the weight matrix. Multiplicative lambda means multiply each diagonal element by (1+lambda). For multiplicative=20 lambda, lambda=3D1000 would result in each parameter changing by 1/1000 of the distance from its present value toward the value that would minimize chisquare if all other parameters were held fixed for a linear problem. If the elements on the diagonal are all about the same size, it makes little difference whether you use additive or multiplicative lambda. But when the elements are very different, and lambda is much bigger than some elements and much less than some others, some parameters are not affected by lambda and converge quickly, while others are frozen by lambda and don't change. So you can have false convergence. > =46rom what I'm aware > of the, usual approach against parameter scale differences is not > a change to the handling of lambda, but of the parameters. I.e.=20 > instead > of minimizing with respect to the actual problem parameters, all=20 > parameters > are modelled as factor*initial_value, with the factors all starting at=20= > 1.0, > and only the factors are minimized by the algorithm. This makes them=20= > all the > same scale initially. The only drawback is that the fit is no longer > idempotent, i.e. re-running the fit with the converged results from an > earlier fit can yield different results. That's what MINUIT does, as > far as I remember from reading its docs. It's true that a sophisticated user can rescale the problem to=20 circumvent the problems with additive lambda. But that's never necessary with multiplicative lambda. I think multiplicative is a better default, because it's scale-independent and it's easier to interpret the lambda=20= value. But it is true that some problems just seem to converge faster with additive lambda than multiplicative lambda, so it's useful to keep both. > 16. Monte Carlo search for initial fit parameters > > HBB: Nice. So all we now miss for a complete typical fitting > mess-of-tools is the Nelder-Meade Simplex search algorithm. > Oh, and about 100 pages of documentation explaining all this to > gnuplot users who don't have anywhere near the experience it takes > to avoid hurting themselves with such a mighty tool chest (whether > by shooting themselves in the foot, or by dropping the chest on = it...). I thought about adding simplex, but from what I've read and heard, its main strength is dealing with discontinuous derivatives, which chisquare fits don't normally have (though it might do better on the original hemisphere-fit demo problem....) But the gain didn't seem to be big enough often enough to be worth creating the new controls. But the Monte Carlo search solves a more common problem: finding a reasonable starting point, particularly if there can be multiple minima. It needs extra user input for the ranges, so I hesistated, but realized that the parameter-file mechanism could let me add it in a way that wouldn't burden people who didn't need/want it. |