Menu

#1294 Aromaticity problem in CDKHueckelAromaticityDetector

cdk-1.4.x
closed
nobody
None
1
2015-01-30
2013-03-20
No

the atoms in the five ring of isoindole are not correctly perceived when using the smiles below are not recognized as being aromatic and having aromatic bonds. (Tested with CDK 1.4.8)
c2c1ccccc1cn2
a molfile of the same molecule with explicit double and single bonds is correctly perceived
and a smile with explicit bonds as well
C2=C1C=CC=CC1=CN2

Related

Bugs: #1294

Discussion

1 2 > >> (Page 1 of 2)
  • John May

    John May - 2013-03-22

    Looking in to it, as a work around for the smiles you can read the aromaticity 'as specified' in the SMILES:

            parser.setPreservingAromaticity(true);
            IAtomContainer m = parser.parseSmiles("c2c1ccccc1cn2");
    

    unfortunately setPreservingAromaticity actually does the opposite of what it describes, when false (default) the CDKHueckelAromaticityDetector is used.

     
  • John May

    John May - 2013-03-22

    a molfile of the same molecule with explicit double and single bonds is correctly perceived and a smile with explicit bonds as well

    Yep I think it's an issue with the atom typing. The parser will do this automatically and may clobber what you actually read.

     
  • John May

    John May - 2013-03-22

    Okay I found why it happens but I'm not sure how best to fix it.

    atom types when reading kekule form:
    [C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, N.planar3]
    atom types when reading aromatic form
    [C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, C.sp2, N.sp2]

    The Sp2 is set for all aromatic atoms in the SMILES:
    SmilesParser

    The aromaticity detector only accepts hard coded atom types, N.sp2 is not one of them:
    CDKHueckelAromaticityDetector

     
  • Egon Willighagen

    The trouble here is that "c2c1ccccc1cn2" is not really a valid SMILES (not clear at least), and that the CDK does not throw an exception for the nitrogen. That is, the SMILES does not say if the nitrogen has a hydrogen or not. That is only possible in the Kekule form. This is why the Daylight SMILES parser also fails to see this as a SMILES, but marks it as a SMARTS instead.

    The be more specific: an 'n' can be aromatic in two ways: with and without a hydrogen, and this input SMILES does not specify which if the two is meant.

    Algorythmically, this means that the atom type detection basically needs extensive searching, which would make it too slow for practical uses; it would need the kekulization in order to get this done.

    Practically, I think the atom typer is wrong here... in the 'aromatic form' it should not have perceived either N.sp2 or N.planar3, because it cannot know.

    And then, the SMILESParser can throw that exception that the nitrogen is underspecified...

     
  • Egon Willighagen

    Oh, and for clarity, the correct SMILES is: c2c1ccccc1c[nH]2

     
  • John May

    John May - 2013-03-25

    c2c1ccccc1cn2: Daylight Depict
    c2c1ccccc1c[nH]2: Daylight Depict

    One thing I did notice is that all lower case atom symbols are set as SP2 hybridisation. Is that correct?

     
    • Egon Willighagen

      Yes, that is correct. There have been long discussions about the meaning of those lower cases, and OpenSMILES made a choice to resolve that ambiguity in the SMILES specs :)

       
      • John May

        John May - 2013-03-25

        Okay, but is the nitrogen in 'c2c1ccccc1c[nH]2' NPlanar3 or SP2? I believe loading that from a mol file using CDK would yield NPlanar3. I'm aware they are basically the same but just wanted to check for consistency.

         
        • John May

          John May - 2013-03-25

          Your test shows this isn't the case. I've tested and signed the patch file, it's attached or on cdk-1.4.x-accepted.

           
        • Egon Willighagen

          N.planar3 and N.sp2 are not the same: the first has three neighbors and no double bond; the second has a double bond and only two neighbors.

          Electronically, the first contributes 2 electrons to the ring system (the lone pair in pz) while N.sp2 contributes one electron from the (shared) double bond.

           
          • John May

            John May - 2013-03-25

            Yes but they're both trigonal planar, non?: https://github.com/egonw/cdk/blob/master/src/main/org/openscience/cdk/interfaces/IAtomType.java#L47-L56

            For matching purposes it's easier to downgrade planar3 to SP2, this then allows you match structures like the two attached.

             
            • Egon Willighagen

              See below: yes, they are both planar, but otherwise electronically quite different.

               
  • Egon Willighagen

    Unit tests that test (and succeed) the expected behavior (with the correct SMILESes).

     
  • John May

    John May - 2013-03-25

    ...argh it's won't attach, here is the structures which you'd want to match.

     
    • Egon Willighagen

      No, these are different; they are tautomers. They have different properties, like energies; they may be easily interconveratible, but that does cost energy. This is not like delocalized electrons.

       
  • John May

    John May - 2013-03-25

    Yes I know they are tautomers but they are delocalized. We'll have to disagree on this, in my case if search for one, I want to find other also. If I used say the hybridization fingerprinter to search these will produce different fingerprints.

     
    • Egon Willighagen

      I am pretty sure the distance of the two hydrogens positions is longer than the distance it can tunnel from one place to another: there is an energetic barrier to convert one into the other, and I seriously doubt that in a crystal you would see that tautomeric conversion at all... in they (human) cell, however...

       
    • Egon Willighagen

      BTW, yes, you may want to find both with the same search. But even then, tautomers are different chemical entities, who may happen to be biologically "equivalent".

       
  • John May

    John May - 2013-03-25

    I'm trying to find the example… but I had this exact case where one was from ChEBI has one from and HMDB. They are referring to the same metabolite but due to the different double bond placement . The point is rather then encoding the specific hybridisation you can encode the geometry instead. As the AtomType.Hybridzation doc states they both have trigonal planar geometry thus using a less specific fingerprint or search you can match these. By InChI standards they are the same - but if you want to tell this in the CDK without running aromatic or tautomerisation algorithms you can do this using the geometry and cut down a lot of search space.

    Anyways, I think we've gone off bug topic :-).
    J

    On 25 Mar 2013, at 11:10, "Egon Willighagen" egonw@users.sf.net wrote:

    According to this presentation, around one bond length:

    http://www.princeton.edu/chemistry/macmillan/group-meetings/DEC_tunneling.pdf

    [bugs:#1294] Aromaticity problem in CDKHueckelAromaticityDetector

    Status: open
    Created: Wed Mar 20, 2013 04:44 PM UTC by Patrik Rydberg
    Last Updated: Mon Mar 25, 2013 11:03 AM UTC
    Owner: nobody

    the atoms in the five ring of isoindole are not correctly perceived when using the smiles below are not recognized as being aromatic and having aromatic bonds. (Tested with CDK 1.4.8)
    c2c1ccccc1cn2
    a molfile of the same molecule with explicit double and single bonds is correctly perceived
    and a smile with explicit bonds as well
    C2=C1C=CC=CC1=CN2

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/bugs/1294/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/

     

    Related

    Bugs: #1294

  • Egon Willighagen

    On Mon, Mar 25, 2013 at 12:48 PM, John May jwmay@users.sf.net wrote:

    I'm trying to find the example… but I had this exact case where one was from
    ChEBI has one from and HMDB. They are referring to the same metabolite but
    due to the different double bond placement.

    That sounds like pure electron delocalization, but that is something
    else than moving hydrogens...

    The point is rather then
    encoding the specific hybridisation you can encode the geometry instead.

    The geometry should be thought of in terms of electron positions...
    N.planar3 and N.sp2 have different electron placements.

    As the AtomType.Hybridzation doc states they both have trigonal planar geometry
    thus using a less specific fingerprint or search you can match these.

    No, because a fingerprint normally takes into account the full structure.

    The same geometry does not mean the same thing.

    By InChI standards they are the same

    Only because it tries to accomodate tautomerism, because indeed many
    people are OK in finding tautomers. Also note that the InChI tautomers
    rules are not very "good".

    • but if you want to tell this in the CDK
      without running aromatic or tautomerisation algorithms you can do this using
      the geometry and cut down a lot of search space.

    So, you rather match N.sp2 to N.planar3 than N.sp3 to N.planar3? The
    latter two are actually way more similar (electronically, chemically,
    ...)!

    Anyways, I think we've gone off bug topic :-).

    Well, nitrogens just are this complex :)

    Egon

    --
    Dr E.L. Willighagen
    Postdoctoral Researcher
    Department of Bioinformatics - BiGCaT
    Maastricht University (http://www.bigcat.unimaas.nl/)
    Homepage: http://egonw.github.com/
    LinkedIn: http://se.linkedin.com/in/egonw
    Blog: http://chem-bla-ics.blogspot.com/
    PubList: http://www.citeulike.org/user/egonw/tag/papers

     
  • Patrik Rydberg

    Patrik Rydberg - 2013-03-26

    Well, while we're slightly off topic. What's the correct SMARTS to use for such a ring when the hydrogen on the nitrogen atom could be anything? is it c2c1ccccc1c[n*]2 ?

     
  • John May

    John May - 2013-04-12

    Not sure I'm afraid :/. Possibly

    c2c1ccccc1c[n;H1]2 
    

    This expression is for H-pyrole nitrogen - SMART Theory Manual

     

    Last edit: John May 2013-04-12
  • John May

    John May - 2013-12-18
    • status: open --> closed
     
1 2 > >> (Page 1 of 2)