|
From: Benoit L. <ble...@ma...> - 2012-08-17 11:25:28
|
Dear OpenBabel developpers,
I wrote an update to the ConformerSearch class.
First I noticed that in the Search method, the convergence test:
> (i.e. number of generations that did not change the score before
considering the iteration converged).
is suited for maximization (i.e. the OBRMSDConformerScore, which is the
default).
If you want to use the OBEnergyConformerScore
<cid:par...@ma...>, i.e. minimize
conformers energy,
you want to continue search until lowest energy in the population do not
decrease anymore (and not increase)
after enough generations.
The problem is that the actual code just monitors score increase:
------------------------------------------------------
if (IsNear(last_score, score)) {
identicalGenerations++;
last_score = score;
} else {
if (score < last_score) {
m_rotorKeys = rotorKeys;
identicalGenerations++;
} else {
last_score = score;
identicalGenerations = 0;
}
}
-------------------------------------------------------
The last_score variable is updated when the score "increase", and if you
consider energy, you want to update it when the score decreases.
And this can be achieved using the the OBConformerScore::GetPreferred
() method that says if you want to maximize or minimize:
---------------------------------------------------
if (IsNear(last_score, score)) {
identicalGenerations++;
last_score = score;
} else {
if (m_score->GetPreferred() == OBConformerScore::HighScore)
{
// Maximize score
if (score < last_score) {
m_rotorKeys = rotorKeys;
identicalGenerations++;
} else {
last_score = score;
identicalGenerations = 0;
}
}
else
{
// Minimize score
if (score > last_score) {
m_rotorKeys = rotorKeys;
identicalGenerations++;
} else {
last_score = score;
identicalGenerations = 0;
}
}
}
--------------------------------------------
I also noticed that the whole population was evaluated completly (the
scoring method is called
for all rotorkeys), calling many time the energy scoring for the same
key. So I added a mapping (C++ std::map)
that provides the energy given the key, if it was already computed. For
long molecules this is quite usefull, and
save about half of energy computations, and then total search
computation time is well reduced fir the same result.
Finally, I added the use of fitness sharing mechanism (see [1]) , in
order to maintain diversity in the population and at the
same time use the energy scoring. More specifically I used a direct
niche classification ([2]).
This is providing good improvement, e.g. using linear alcanes with n=70,
it could find back the flat chain, which is not possible
with the elitist implementation. Of course, there are a lot of possible
tweaks in GA, this is a first attempt. It could be possible to
add many options to control the GA scheme... just requires time and testing.
So I'd be glad to contribute, and now the question is how. Should I send
my changes (modified conformersearch.cpp and conformersearch.h)
to a regular developper for testing, or should I register in the
Sourceforge and submit through SVN?
Best regards,
[1] Goldberg, D.E. & Richardson, J.: Genetic Algorithms with Sharing for
Multimodal Function Optimization,
Proc. 2nd Inter. Conf. Genetic Algorithms, pp41-49,1987.
[2] Miler, B.L. & Shaw, M.J.: Genetic Algorithms with Dynamic Niche
Sharing for Multimodal Function
Optimization, IEEE International Conference on Evolutionary Computation,
pp786-791, Piscataway, NJ:
IEEE Press 1995
--
Dr. Benoît Leblanc
Materials Design SARL
18, rue de Saisset
92120 Montrouge
France
Tel +33 1 70 61 58 14 x 309
|