Thread: Re: [Cdk-user] Deduce Bond System Tool

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Kevin,

On Wed, Mar 14, 2012 at 6:18 PM,  <kev...@sy...> wrote:
> I have been having problems with the Deduce Bond System Tool.  When I use
> the routine FixAromaticBondOrders I find that it quite often times out (100
> seconds)

That is because of the AllRingsFinder being used, which is an
exhaustive ring searchers and indeed time consuming algorithm.

However, it has already been said to not search for rings larger than
7 atoms... so, that combo might indicate a bug.

> if the molecule contains 5 or more (possibly unconnected) rings and
> one of them has complexities such that a simple solution cannot be found.
> On inspecting the source code (version 1.4.2), it appears that all rings in
> the molecule being tested are treated combinatorially

Which is in fact needed, because you cannot know on beforehand which
bonds lack the required 'double' order flag, and which are OK... that
is, on the old CDK world... and the Tool has been written in that old
world.

In fact, this was one of the reasons why I set out to rewrite atom
detection in the CDK.

> – all combinations of
> possible solutions for each ring being tested.  It occurred to me that it
> would potentially be faster (for complex molecules) to divide the rings up
> into groups of interconnected rings and treat these separately.

I use this trick somewhere else too, in the aromaticity detector, if
not mistaken...

> The attached code (which I have been trying out in the LICSS system) does just
> this – the core algorithm is unchanged, but is applied sequentially to each
> group of interconnected rings rather than to the whole molecule. In my
> hands, this makes things much faster.  For example, Smiles such as
> Oc1c(C2CC(Cc3ccccc23)c4ccc(cc4)c5ccccc5)c(=O)oc6ccccc16 (Difenacoum) is now
> configured very quickly with four kekulised aromatic rings, the remaining
> pyranone ring (which isn’t recognised as aromatic) is returned unchanged.
> With the previous algorithm, this molecule timed out.

I was discussing the class with Ola (Bioclipse) this week, and since
so many people use input formats where double bond localization is
missing (SMILES as above, but also MDL molfiles with query bond type
4), we just have to have a good algorithm.

I think your approach is a good one, and wonder why it times out in
the first place, because the AllRingsFinder is supposed to stop after
finding 7 atoms anyway... that should be enough a termination
condition for it the be fast anyway... I think we might have a
regression...

The other thought is that both input types have more information:
which bonds do not have a well-defined bond order. For SMILES this is
each bond between two lower case organic subset atoms. For MDL
molfiles it is the bonds of type 4.

Thus, I think we should not search in just small ring systems (as they
might easily get large enough for the AllRingsFinger to explode on
anyway), but restrict the searching on just the unknown bits.

This idea I was planning to discuss next week Thursday with Ola and
Klas (cc-ed), looking into implementing that approach. That should
even be faster and I expect more accurate, as we might not fall into
the trap changing bond orders that were explicitly single in the
SMILES or molfile in the first place...

(That may sound obscure, but was in fact one of the issue with the
double bond recovering tool we used before the we got this code
donated by the EPA...)

That set aside... we always have room in the CDK for alternative
algorithms, and the thing I am thinking of requires more prior
information that your suggestion does...

> When I originally tried this approach to grouping the interconnected rings,
> I was expecting to have to rewrite the IsStructureOK routine which checks
> each structure for being ‘properly’ aromatic.

The current 1.4.x DeduceBondSystemTool does not know about the atom
types in the structure... how does your code figure out what is
aromatic?

> In fact, no modification is
> necessary in that it quite happily accepts rings with no double bonds as
> aromatic (if the atoms are sp2 hybridised).  This can lead to problems with
> both the existing algorithm (and my variant).  For example, anthracene or
> acridine are sometimes kekulised with the central ring containing no double
> bonds (depending on the exact ordering of the rings detected by the ring
> finder – itself dependent, I believe, on how the Smiles parser happens to
> set up the IMolecule). If you do decide to look at my suggested approach,
> and also fix the aromaticity checking, it will be necessary to make sure
> that it is applied only to those rings in each interconnected group.

I very much like to work things out... one aspect is, is to isolate a
set of good structures that (alternative) implementations should be
able to solve in reasonable time...

The second thing is how we want to do things. The DeduceBondSystemTool
effectively uses it's own atom type code; I personally like to reuse
the central atom typer, which we can independently test. Same for
aromaticity detection... but, aromaticity itself is not important for
deducing double bond locations... it's just the delocalized nature we
want to get rid of, and should not even care about what is aromatic,
anti-aromatic, or non-aromatic...

Thoughts?

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (http://www.bigcat.unimaas.nl/)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Thread: Re: [Cdk-user] Deduce Bond System Tool

cdk-user