Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#545 aromaticity detection algorithms

Accepted
closed
nobody
cdk-1.4.x (181)
cdk-1.4.x
5
2012-10-30
2012-10-01
Egon Willighagen
No

there was recently a post about various alternative aromaticity perception models:

http://blueobelisk.shapado.com/questions/aromaticity-perception-differences

While principally simply, converting a Hueckel rule into an algorithm leaves many unanswered questions.This causes most tools to disagree with each other on whether molecules are aromatic. (I'm sure it gets even interesting of we look at which bonds are aromatic).

I do not know how the reference (gold) list was created (I left that question), but the CDK tool makes the following interesting choices:

  1. it only looks into at most three fused rings. So, if the individual rings do not fulfill the Hueckel rule, but two or three rings do (think azulene), it still detects aromaticity.
  2. if the ring has a sprouting double bond, the ring is not aromatic (think benzoquinone)

Now, the latter choice is based on user bugs from the past, but a bit controversial. I am not sure if choice 2 puts us in one of the groups, rather than the other, but it could be.

Anyway, I have long planned to make an alternative aromaticity perception tool that does not worry about the double bond sprouting, as many other tools are happily marking benzoquinone as aromatic.

Hence these three patches:

https://github.com/egonw/cdk/commits/425-14x-altArom

The first patch [0] introduces a new class called DoubleBondAcceptingAromaticityDetector to perceive aromaticity (with unit tests). It also extends the JavaDoc of the CDKHueckelAromaticityDetector, explain this double bond sprouting, and pointing to the new class as alternative.

The second patch [1] actually fixes a unrelated bug in one of the unit tests (for both the old and the new aromaticity perception classes): the placement of the double bonds is not deterministic, and thus testing that the placement is exactly the same in not the correct way of testing it. Instead, it now tests just if for both molecules all atoms are aromatic.

The new class is using the same algorithm (and copies the implementation), with the exception of the double bonds pointing out of the ring. Thus, the new class will find more molecules to be 'aromatic'.

This patch introduces a new class (thus new API) is tested, and aimed at cdk-1.4.x.

Your comments are most welcome.

Egon

0.https://github.com/egonw/cdk/commit/c86e48c4c593113ff0b326486c1a2b84a6c1a439
1.https://github.com/egonw/cdk/commit/dc882c1374b60bad8cb696d226e16a544c802884

Discussion

  • John May
    John May
    2012-10-04

    Hi Egon,

    I've started having a look at this but won't be able to review in depth till next week.

    Some general API things I was thinking:
    - adding a general interface for aromaticity detection 'AromticityDetector' then the detector can be injected into algorithms that want to perceive aromatic systems.
    - the CDKHueckelAromaticityDetector could be deprecated and renamed to be a lot more concise - as "HueckelDetector implements AromticityDetector" this gives you all the information you need. CDKHueckelAromaticityDetector will still exists so not problems with existing code.
    - same with DoubleBondAcceptingAromaticityDetector, it's very verbose. It would be nice if we could think of something more concise... no suggestions yet but when I look fully I try and think of something.

     
  • John May
    John May
    2012-10-04

    ... (posted too soon)...

    Just suggestions but, what do you think?

    J

     
  • From the CDK hangout of last Friday:

    CDKHueckelAromaticityDetector 2 HueckelAromaticityDetector -> yes, in master.

    Add IAromaticityDetector interface -> yes, in master.

    DoubleBondAcceptingAromaticityDetector -> suggestions welcome.

     
  • John May
    John May
    2012-10-10

    Just reviewing now. Stephan has suggested using "Substituent" as a concise name for the alternative method.

    So in master these could be... (dropping "Aromticity" as with the interface it is implied to be an AromticityDetector).

    Thoughts on the name?

    public class SubstituentDetector extends IAromticityDetector {
        public void detect(IAtomContainer container) {
            // ... logic
        }
    }
    
    public class HueckelDetector extends IAromticityDetector {
        public void detect(IAtomContainer container) {
            // ... logic
        }
    }
    
     
  • John May
    John May
    2012-10-10

    Damn looks like SF didn't actually post my response.... yay!...

    Have reviewed - looks good. I tried with the benzoquinone case and couldn't get it to work. This turned out to be due to the requirement of atom type perception and it now works okay. I wrote a unit test just to confirm.

    I've signed of your commits on this branch and also included the additional unit test.

    Some minor points which could be addressed without review:
    * @cdk.created on the alternate detector could be changed
    * SVN tags on Hueckel detector can be removed

    As you stated this has a lot of duplicate code. It would be nice if we could unify the methods in master and make them use the same core perception code (possibly put in an abstract class). That way if find a bug in one we don't have to patch the other.

    Required Action: Contributor to pull signed commits from the branch and review the additional unit test.

    Cheers,
    J

     
  • John May
    John May
    2012-10-21

    • branch: --> master
    • milestone: Needs_Review --> Accepted
     
  • John, the patch report has the tag 'cdk-1.4.x' but now as 'branch' master...

    Not sure about the naming suggestions, but that Substituent name for the new alternative sounds weird... because that one particularly does not care about substituents... Also, I like to keep the 'aromaticity' in the name, is it detects that, and not substituents... HueckelDetector (in master) I could sort of live with, but the new method is just as much Hueckel as the first...

    I'll look at your additional unit test shortly, and at the two further suggestions...

     
    • John May
      John May
      2012-10-21

      John, the patch report has the tag 'cdk-1.4.x' but now as 'branch' master...

      Ignore that... when you add a new field all existing trackers get updated with the default (which is master) - changed to cdk-1.4.x. Will sort out the other also.

      Not sure about the naming suggestions, but that Substituent name for the new alternative sounds weird... because that one particularly does not care about substituents...

      Yeah you're right - it's always so hard to get concise names for scientific code.

      Also, I like to keep the 'aromaticity' in the name, is it detects that, and not substituents... HueckelDetector (in master) I could sort of live with, but the new method is just as much Hueckel as the first...

      Nah that's fine it's personal preference really and it's difficult if there isn't a concise alternative. I'm normally on the interfaces so doesn't make a difference to me...

      IAromaticityDetector detector = ...;
      ....
      public void calculate(IAromaticityDetector detector);
      
       
  • John May
    John May
    2012-10-21

    • branch: master --> cdk-1.4.x
     
    • John May
      John May
      2012-10-21

      Worked it out now - that field gets updated if you edit the post. Hence when I changed it from needs review -> accepted that defaulted to master.

       
  • The joined interface we need to further think about: you cannot define static methods on interfaces...

     
    • status: open --> closed
    • milestone: Needs_Review --> Accepted
     
  • Per John's sign-offs, I pushed the commits, checked his additional test, and made two suggested discretionary tweaks.