Menu

#660 problem with reading/converting aromatic molecule

2.3.x
open
5
2014-08-20
2010-10-26
gert thijs
No

I was testing Open Babel 2.3.0 for its ability to use it in a script to test for unique compounds when I encountered a problem concerning the processing of aromaticity when reading smiles input with some specific aromatic rings.

Step 1: convert this protonated molecule to canonical smiles:
$ echo "C1=CC=CC=C1N1[N-]C(=O)C=CC1(=O)" | babel -ismi -ocan
O=c1ccc(=O)n([n-]1)c1ccccc1

Step 2: read the canonical smiles and convert it to canonical smiles :
$ echo "O=c1ccc(=O)n([n-]1)c1ccccc1" | babel -ismi -ocan
O=C1CCC(=O)[N-]N1c1ccccc1

One would expect that the result of step 2 would generate the same canonical smiles again, but as the example shows this is not happening.

If I perform the same procedure but with neutral form the procedure is OK.
$ echo "C1=CC=CC=C1N1NC(=O)C=CC1(=O)" | babel -ismi -ocan
O=c1ccc(=O)n([nH]1)c1ccccc1

$ echo "O=c1ccc(=O)n([nH]1)c1ccccc1" | babel -ismi -ocan
O=c1ccc(=O)n([nH]1)c1ccccc1

Gert Thijs
gert.thijs@silicos.com

Discussion

  • Geoff Hutchison

    Geoff Hutchison - 2010-10-26

    I'm a bit disappointed in the timing of the report, since we undertook a huge effort for canonicalization and aromaticity bugs about 2-3 weeks ago. There were countless messages on the mailing list about aromaticity problems and canonical failures.

    It would have been nice to see this report before we tagged 2.3.0.

    I suspect the problem is in kekulize.cpp -- it's hard to assign an electron count for that ring. I'll probably get to it in a few days -- or if you want pointers, I can show you parts of the code which are likely culprits.

     
  • gert thijs

    gert thijs - 2010-10-27

    I am sorry for the timing of this report. I know you have all put a lot of work in the code and especially the canonicalization code works very smooth now and I could process the 9.6 million compounds within our Simosa database without any problem using the 2.3.0rc2.
    But these tests were on neutral compounds and it was only yesterday when I started working on the output of the pKa calculator of ChemAxon that I discovered this discrepancy.
    The given ring system is also one of those cases where different programs behave differently. For instance with the ChemAxon tools there are multiple models and in one model the ring is being considered as aromatic while with the other it is not. So I guess this one of those rather particular cases when several rules coincide when using kekulize,cpp on this ring.