Hi Edward,
Many thanks for taking the time for the detailed and informative
response. Enjoy your time off!
Doug
On Dec 21, 2006, at 11:15 AM, Edward d'Auvergne wrote:
> That summarises the differences between the use of the
> 'full_analysis.py' script and Modelfree4 using the FAST-Modelfree
> interface quite concisely. I'll just expand or explain a few of those
> points. There are really four important differences here: model-free
> model selection; model-free model elimination; model-free
> optimisation; and the strategy for obtaining the global description of
> the Brownian rotational diffusion tensor together with all model-free
> models and parameters.
>
>
> 1 Model selection
>
> In the 'full_analysis.py' script AIC model selection is employed. The
> reason for using this criterion is because the global problem is
> sought by minimising the Kullback-Leibler discrepancy, more about this
> later. In the FAST-Modelfree interface to Modelfree4, the ANOVA
> step-up hypothesis testing of Mandel et al., 1995 is used. I've shown
> in d'Auvergne and Gooley, JBNMR, 2003, 25(1), 25-36 that there are
> significant deficiencies in the hypothesis testing model selection.
> Specifically there are two flaws: not selecting a model when one
> ought to be selected and under-fitting. If no model is selected (when
> one should be!) then there will be segments of the macromolecule which
> cannot be dynamically described (but which should be). The
> consequences of under-fitting are that S2 is overestimated and te and
> Rex parameters are underestimated by being dropped from the final
> model. These two flaws cause the molecule to appear more rigid than
> reality. This is what you are seeing Doug with the higher proportion
> of models m3 to m5.
>
>
> 2 Model elimination
>
> This may or may not be causing differences between the results.
> Essentially if a model-free model has failed, the 'full_analysis.py'
> script will kick it out prior to model selection. See d'Auvergne and
> Gooley, JBNMR, 2006, 35(2), 117-135 for more details.
>
>
> 3 Optimisation
>
> This point will make a major difference to the results. For the
> optimisation of the model-free models (ignore the optimisation of the
> diffusion tensor for now) there are 4 optimisation issues:
> optimisation precision; failure of the Levenberg-Marquardt
> minimisation (in relax, Modelfree4, and Dasha); failure of the limits
> algorithm; and a bug in Modelfree4. I have a paper in submission
> which fully explores each of these issues.
>
> The difference in the precision of optimisation between the default in
> relax and those values hard-coded into model-free is 20 orders of
> magnitude! For more details see the archived post located at
> https://mail.gna.org/public/relax-devel/2006-10/msg00122.html
> (Message-id:
> <7f0...@do...>).
> There are a few more details inter-dispersed in the thread to which
> that post belongs starting at
> https://mail.gna.org/public/relax-devel/2006-10/msg00114.html
> (Message-id:
> <7f0...@do...>).
> relax can easily be set to the low precision of Modelfree4 however I
> wouldn't recommend it as the convolution of the model-free space will
> mean that early termination of optimisation due to low precision will
> result in parameter values far from the true values.
>
> The Levenberg-Marquart algorithm which is the only optimisation
> algorithm in Modelfree4, one of two in Dasha, or one of many in relax
> is also an issue. The problem is described in the fine print of the
> algorithm - the singular matrix failure of the Levenberg-Marquardt
> matrix. This is often described as being rarely encountered. Yet in
> model-free analysis the singular matrix failure is actually quite
> common. It occurs when ever an internal correlation time parameter
> becomes undefined - i.e. when the corresponding order parameter is
> equal to one. In this case changing the correlation time has no
> effect. There are two things which amplify the issue, both the grid
> search and the limits algorithm significantly increase the probability
> of having an S2 value of 1. This issue is a hidden issue as those
> models in which the Levenberg-Marquardt algorithm has failed are often
> not selected by the model selection algorithm as their optimised
> chi-squared value is overestimated.
>
> The limits algorithm used in Modelfree4 is another point of failure.
> This can be pictured as follows (taken from a submitted paper). Say
> minimisation is constrained within a cube arbitrarily placed within a
> space. Let there be a single minimum located towards one face of the
> cube. It is simultaneously a local and global minimum within the cube.
> If the minimum is much narrower than the length between points of the
> grid search it is conceivable that a moderate curvature of the space
> will cause the grid search algorithm to select a position distant from
> the minimum. This often occurs within the model-free space because of
> the shallow, curved valley which starts at infinite correlation times
> and heads down to the minimum. Assuming only one minimum within the
> entire space, optimisation without constraints will follow a
> trajectory determined by the curvature of the space from the initial
> position to the minimum. If the trajectory is contained within the
> cube, constraints should not influence optimisation. However if part
> of the trajectory lies outside the cube the constraint algorithm will
> influence whether the minimum will be found. Where the trajectory
> traverses the surface of the cube if, between the exit and reentry
> points, there is a downhill path where the gradient is always
> negative, then this path should be followed to allow the minimum to be
> found. The constrained trajectory should be similar to the
> unconstrained trajectory for those parts within the cube. The parts
> outside the cube should be replaced by a trajectory along the face of
> the cube between the exit and entry points. Within the model-free
> space this hypothetical situation does occur due to the convoluted
> nature of the space. However Modelfree4 does not follow the downhill
> path along the constraint and optimisation is terminated far from the
> minimum.
>
> The last difference is caused by a bug in the Modelfree4
> Levenberg-Marquardt algorithm whereby optimisation is terminated
> early. In a paper that has been submitted, I've shown that between 13
> to 45% of residues or spin systems are affected by this issue
> dependant on the model-free model.
>
>
> 4 Optimisation of the global model
>
> This one is quite complex and is in another manuscript I have
> submitted for publication. Essentially in Modelfree4 using the
> FAST-Modelfree interface you are forced to follow the paradigm of
> starting the analysis using an initial estimate of the diffusion
> tensor first used in Kay et al., Biochem, 1989, 28(23), 8972-8979.
> Using this estimate you then optimise the model-free models.
>
> The 'full_analysis.py' script takes a completely different approach to
> solving the simultaneous optimisation and model selection global
> problem (the diffusion tensor + all model-free models for all spin
> systems). For details, see the post at
> https://mail.gna.org/public/relax-users/2006-10/msg00009.html
> (Message-id:
> <7f0...@do...>)
> and all the other messages following from Sebastien Morin's post at
> https://mail.gna.org/public/relax-users/2006-10/msg00007.html
> (Message-id: <452...@do...>).
>
>
> I hope that that sufficiently describes the differences in the
> results!
>
> Cheers,
>
> Edward
>
>
> On 12/21/06, Chris MacRaild <c.a...@do...> wrote:
>> Hi Doug,
>>
>> I've done similar comparisons and come to similar results.
>>
>> There are a few things to keep in mind when trying to rationalise
>> these
>> differences. First, the approach coded in full_analysis.py makes a
>> serious attempt to optimise both the rotational diffusion tensor, as
>> well as the local dynamic parameters. Modelfree, on the other hand,
>> relies on you having a good estimate of the tensor before you
>> start. So
>> the first thing to check is whether the diffusion tensor relax gets
>> agrees with the one you gave Modelfree - if not, all bets are off
>> with
>> respect to the dynamic parameters. Second, the model selection
>> used by
>> relax is different to that used by Modelfree, so relax will in some
>> cases pick different models, even with everything else being equal.
>> Edward can elaborate on why the relax approach is superior, I'm
>> sure...
>> Third, the optimisation code in relax is much more up-to-date, so is
>> better at finding the true best fit for any given model to your data.
>> Finally, its worth keeping in mind that in many cases, dynamic
>> parameters are poorly defined, even by good data. Even very big
>> differences in tau_e, eg. are not always significant.
>>
>> The difference that would concern me is if there are dramatic
>> differences in order parameters - S2 is generally fairly robust to
>> the
>> above issues, within reason.
>>
>> Cheers,
>> Chris
>>
>>
>> On Wed, 2006-12-20 at 16:18 -0500, Douglas Kojetin wrote:
>> > Hi All,
>> >
>> > Has anyone compared runs of relax (m1 through m5; full_analysis.py
>> > script) vs. a traditional fastmodelfree/modelfree run using the
>> > binary provided by the Palmer group? I have ... I think I'm using
>> > similar parameters for both runs, and I'm seeing a drastic
>> difference
>> > in results (models chosen).
>> >
>> > Thanks in advance for the input,
>> > Doug
>> >
>> > _______________________________________________
>> > relax (http://nmr-relax.com)
>> >
>> > This is the relax-users mailing list
>> > rel...@do...
>> >
>> > To unsubscribe from this list, get a password
>> > reminder, or change your subscription options,
>> > visit the list information page at
>> > https://mail.gna.org/listinfo/relax-users
>> >
>>
>>
>> _______________________________________________
>> relax (http://nmr-relax.com)
>>
>> This is the relax-users mailing list
>> rel...@do...
>>
>> To unsubscribe from this list, get a password
>> reminder, or change your subscription options,
>> visit the list information page at
>> https://mail.gna.org/listinfo/relax-users
>>
|