#1330 Smiles Parser loses aromaticity

cdk-1.0.x
closed
nobody
None
1
2014-08-22
2014-02-14
Ben Wolfe
No

I think it's the smiles parser that is losing the aromaticity, but I don't know.

example:

SmilesParser smilesParser = new SmilesParser(
SilentChemObjectBuilder.getInstance()
);
IMolecule molecule = smilesParser.parseSmiles(smile);
MDLV2000Writer m2w = new MDLV2000Writer(System.out);
m2w.writeMolecule(molecule);

I used my go to test molecule:

C(NCCCCC(NC(OC(C)(C)C)=O)C([O-])=O)(=O)c1cc2[n]([Ru+2]34([n]5c2cc(C)cc5)([n]2c(c5[n]4cccc5)cccc2)[n]2c(c4[n]3cccc4)cccc2)cc1

There are no coordinates in the ouput molfile so I could rule out sdg removing the aromaticity but the bond orders should still be ok I would assume.

Related

Bugs: #1330

Discussion

  • John May

    John May - 2014-02-14
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -10,7 +10,9 @@
         m2w.writeMolecule(molecule);
     </code>
     I used my go to test molecule:
    -"C(NCCCCC(NC(OC(C)(C)C)=O)C([O-])=O)(=O)c1cc2[n]([Ru+2]34([n]5c2cc(C)cc5)([n]2c(c5[n]4cccc5)cccc2)[n]2c(c4[n]3cccc4)cccc2)cc1"
    +
    +~~~~~
    +C(NCCCCC(NC(OC(C)(C)C)=O)C([O-])=O)(=O)c1cc2[n]([Ru+2]34([n]5c2cc(C)cc5)([n]2c(c5[n]4cccc5)cccc2)[n]2c(c4[n]3cccc4)cccc2)cc1
    +~~~~~
    
     There are no coordinates in the ouput molfile so I could rule out sdg removing the aromaticity but the bond orders should still be ok I would assume.
    -
    
     
  • John May

    John May - 2014-02-14

    Unfortunately organometalics are always problematic (ChEMBL blog)

    Aromatic bonds in mol files are for queries and should not be used for representation. You can turn that option on in the writer but I really, really, do not recommend it. You should only ever store it if you use MDL's definition of aromaticity. Their model only considers double bonds and not lone-pairs (like Daylight's). The Molfile can not correctly represent or store that information.

    Your structure is also not valid and you will not have the expected output even if you managed to get that to work. The bonds to the Ru are represented as covalent bonds in your input. This means each nitrogen is 4 valent. An aromaticity algorithm should not delocalise those as it leads to errors when reading. Check out the Daylight (official) interpretation here. Each nitrogen has only single bonds and the 6 member rings are no long benzene like (nor aromatic).

    You simply can not correctly represent that compound in SMILES unless you do this

    CC1=CC2=[N](C=C1)[Ru++]13([N]4=CC=CC=C4C4=[N]1C=CC=C4)([N]1=CC=CC=C1C1=[N]3C=CC=C1)[N]1=CC=C(C=C21)C(=O)NCCCCC(NC(=O)OC(C)(C)C)C([O-])=O
    

    or if you really want the aromatic symbols (you don't) then you'll need to disconnect the metal -

    [Ru++].c1ccc(nc1)-c1ccccn1.c1ccc(nc1)-c1ccccn1.Cc1ccnc(c1)-c1cc(ccn1)C(=O)NCCCCC(NC(=O)OC(C)(C)C)C([O-])=O
    

    Your SMILES is also missing single bonds between the benzene rings, I'm quite curious where you got the SMILES from?

    J

     
    Last edit: John May 2014-02-14
  • Ben Wolfe

    Ben Wolfe - 2014-02-14

    I guess what I meant was double bonds within the 6-membered rings in the bond block of the molfile are listed with bond order 1. The only bond orders remaining as 2 are for carbonyls. I think I generated the smiles from a very old version of openbabel I had lying around using a molfile as input? Aromaticity I am less concerned with, I was just using the word to describe the resulting double bonds I expected to see in the bond block.

     
    Last edit: Ben Wolfe 2014-02-14
  • Ben Wolfe

    Ben Wolfe - 2014-02-14

    Using the daylight depict cgi at least shows some double bonds in the rings. I can't tell quite how many or where due to all the overlaps. MDLV2000Writer only has double bonds for carbonyls. Also the two smiles you posted:
    the first results in an InvalidSmilesException
    the second also only shows double bonds for carbonyls and none in rings

     
    Last edit: Ben Wolfe 2014-02-14
    • John May

      John May - 2014-02-14

      Ah right you’re using 1.4… just noticed the IMolecule. 1.5 will add the same bonds as daylight for you.

      J

      On 14 Feb 2014, at 18:53, Ben Wolfe wizzerd02@users.sf.net wrote:

      Using the daylight depict cgi at least shows some double bonds in the rings. I can't tell quite how many or where due to all the overlaps. MDLV2000Writer only has double bonds for carbonyls.

      [bugs:#1330] Smiles Parser loses aromaticity

      Status: open
      Created: Fri Feb 14, 2014 05:36 PM UTC by Ben Wolfe
      Last Updated: Fri Feb 14, 2014 06:20 PM UTC
      Owner: nobody

      I think it's the smiles parser that is losing the aromaticity, but I don't know.

      example:

      SmilesParser smilesParser = new SmilesParser(
      SilentChemObjectBuilder.getInstance()
      );
      IMolecule molecule = smilesParser.parseSmiles(smile);
      MDLV2000Writer m2w = new MDLV2000Writer(System.out);
      m2w.writeMolecule(molecule);

      I used my go to test molecule:

      C(NCCCCC(NC(OC(C)(C)C)=O)C([O-])=O)(=O)c1cc2n([n]2c(c5[n]4cccc5)cccc2)[n]2c(c4[n]3cccc4)cccc2)cc1
      There are no coordinates in the ouput molfile so I could rule out sdg removing the aromaticity but the bond orders should still be ok I would assume.

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/bugs/1330/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #1330

  • Ben Wolfe

    Ben Wolfe - 2014-02-14

    I just tried using a more reasonable smiles string like:
    "C(NCCCCC(NC(OC(C)(C)C)=O)C([O-])=O)(=O)C9=CC7=N([N]6=C(C5=CC=CC=[N]45)C=CC=C6)[N]8=CC=C(C=C78)C)C=C9"

    and it seems to work as expected. Sorry for not including which version of cdk I was using. I am sure that might be helpful info.

     
  • John May

    John May - 2014-03-13
    • status: open --> closed
     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks