Menu

#760 incorrect treatment of aromatic with explicit bond

open
nobody
5
2017-07-26
2011-10-24
No

I was reviewing the Open Babel SMILES code and noticed that if atom1 and atom2 are joined, and atom1 and atom2 are aromatic, then the bond order is reset to _order=5. This is done in different ways in smilesformat.cpp

On line 1055:
if (_order != 2) _order=5; //Potential aromatic bond -- but flag explicit double bonds
On line 1958:
_order=5; //Potential aromatic bond

I haven't yet figured out if they are supposed to be the same or different, but I strongly suspect that it's supposed to be the same check.

It is possible, but rare, to have an explicit '-' bond marked to indicate that a bond between two aromatics is supposed to be a single/non-aromatic bond. For examples:

c12-c3c(cccc3)[IH]c1cccc2
Clc1cc-2c3c(cccc3n1)-c4c2cccc4
[Ni]123n4c5c6ccccc6c4nc-7[n+]2c(-c8ccccc78)nc9n1c(c1ccccc91)nc-1[n+]3c(-c2ccccc12)n5
O=c1nc-2cccccc2s1
c-12n(c3ccccc3n2)Cc4c1cc5ccccc5c4

These are not handled correctly in Open Babel. In RDKit:

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("c12-c3c(cccc3)[IH]c1cccc2")
>>> Chem.MolToSmiles(mol)
'c1ccc2c(c1)[IH]c1ccccc1-2'

See the "-2" at the end? That's the explicit single bond.

While Open Babel sees the explicit '-' as being an aromatic:

% echo 'c12-c3c(cccc3)[IH]c1cccc2 blah' | babel -ismi -osmi
c12c3c(cccc3)[IH]c1cccc2    blah

As an extreme example, here's a triple-bond joining two
aromatic rings in RDKit:

>>> mol = Chem.MolFromSmiles("c1ccc[n-2]1#[n-2]1cccc1")
>>> Chem.MolToSmiles(mol)
'c1cc[n-2](#[n-2]2cccc2)c1'

Open Babel totally ignores the triple bond and says the
bond between the two aromatic rings is a single bond.

% echo "c1ccccc1#c1ccccc1 blah" | babel -ismi -osmi
c1ccccc1c1ccccc1    blah

BTW, I'm using the Open Babel build from today's version control.

Discussion

  • Termo

    Termo - 2017-07-26

    How can this bug still stick around? I just ran into it with Benzyne 'c1ccccc#1', which I suddenly found had wrong atomic count in my database...

    obabel -ismi -:'c1ccccc#1' -oreport -h
    FILENAME:
    FORMULA: C6H6
    MASS: 78.1118

    obabel -iinchi -:'InChI=1S/C6H4/c1-2-4-6-5-3-1/h1-4H' -oreport -h
    FILENAME:
    FORMULA: C6H4
    MASS: 76.0960

    Open Babel 2.4.1 -- Jan 19 2017

     

    Last edit: Termo 2017-07-26
  • Termo

    Termo - 2017-07-26

    And more weird stuff when looking at bond orders for the input inchi molecule:

    [b.GetBondOrder() for b in ob.OBMolBondIter(mol)]
    Out[40]: [2, 1, 1, 2, 2, 2, 1, 1, 1, 1]
    

    and the output of a small function I have made to give a bond dict:

    getbsum('InChI=1S/C6H4/c1-2-4-6-5-3-1/h1-4H', term='inchi')
    Out[43]: [defaultdict(int, {'[C]-1-[C]': 2, '[C]-1-[H]': 4, '[C]-2-[C]': 4}), #C    6 #H    4
     dtype: int64]
    

    So 4 double bonds (should be 2) and no triple bonds (should be 1) ??