Re: [Cdk-devel] new sssr idea

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Sun, Jun 19, 2011 at 12:24 PM, Andrew Dalke
<da...@da...> wrote:
> On Jun 19, 2011, at 9:16 AM, Egon Willighagen wrote:
>> Andrew, maybe we should write up a review on aromaticity in
>> cheminformatics? Set up a simple test set of corner cases, describe
>> the algorithms around, and show their limitations, using Open Source
>> implementations?
>
> I am not interested in doing so.

Well, neither am I really :)

> The algorithms are already well-enough known

Maybe for OpenEye users, but not in general. At least the whole ideas
that there are different definitions of aromaticity seems very much
lost in literature, from my perspective.

I will try to find time to read up on their documentation, and should
probably get myself academic licenses for OpenEye and other
proprietary tools. (I have not made time yet to read through all
licenses to make sure I am allowed to develop CDK stuff, while having
such licenses. This sounds absurd, but has been a problem in the past
8 years!)

> that, for example,
> OpenEye implements not one but multiple families of aromaticity
> perception, including those from other vendors.

Good! OpenEye has been doing it right here. The CDK only implements
one algorithm. The CDK could use more approaches; I will have to look
at what OpenEye does.

> One of these is the MMFF aromaticity model, which is well-described
> and also implemented in OpenBabel.

Happy to hear that, and I am happy to hear you talk about families and
models here. Because I have not seen such talk, and precisely how I
see this. E.g. this volume calculation paper I just looked at does not
describe *which* model/family they are using for aromaticity. This is
the big problem here, because their parameters are effected by it.

> Is your point that knowledge, no likely described in the
> literature, just hasn't made its way into CDK? (I haven't looked
> at RDKit to see how it implements this, so I can't say that it's
> a general free software issue.)

One point indeed is that the CDK implements one model right now, and I
welcome alternative methods.

> Such a study would be extremely tedious and I don't understand
> what the end goal would be. Would it be to develop a better
> aromaticity model? In which case it would need a diverse set
> of structures where the aromaticity is known experimentally.

That would in fact be a very good goal, but the problem here is indeed
that such experimental data is not omnipresent.

> Would it show problems in the overall definition of aromaticity,
> or mostly highlight limitations in the specific implementations
> of the algorithm?

I think it would be good for the larger cheminformatics community to
actually understand 'aromaticity', because it is one significant
source of incompatibility between toolkits right now.

This would be tedious, and not directly resulting in new applications.
I would help the community, though.

Egon

-- 
Dr E.L. Willighagen
Postdoctoral Researcher
Institutet för miljömedicin
Karolinska Institutet (http://ki.se/imm)
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers