Menu

#1378 Error when parsing particular SMILES

cdk-1.6.x
open
John May
None
1
2016-08-10
2016-02-19
No

CDK 1.5.12 give errors when parsing these three SMILES (all are ChEMBL compounds).

  • org.openscience.cdk.exception.InvalidSmilesException: could not parse O(CC)C(\C(\C(C(=O)OCC)=O)=N(/=C(C)N1C)\c2c1cccc2)=O, Multiple bonds specified:

The reason seems to be the /= bond specification

  • org.openscience.cdk.exception.InvalidSmilesException: could not parse OC(=C(\C\1=N\CCCN(CC)c2cccc(C)c2)O)\C\1=N/C(CCN3C(=O)OCC)CC3, Ring closure bonds did not match. Ring was opened with '\' and closed with '/'. Note - directional bonds ('/','\') are relative.

  • org.openscience.cdk.exception.InvalidSmilesException: could not parse 'Fc1ccccc1N2CCN(CC2)CC\N=C/3\C(=C(/C/3=N\C(CCN4C(=O)OCC)CC4)O)O', Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.

OpenBabel and Chemical Identifier Resolver successfully parse all three.

Any ideas?

Related

Bugs: #1378

Discussion

  • John May

    John May - 2016-02-19

    Grrr ChEMBL and their crappy SMILES. These are bugs in which ever program they use to write the SMILES (Pipeline Pilot I think). I've told them about in numberous times but doubt they'll action it.

    I'll address the second one first, for directional bonds (slashes) the meaning is relative. These are all the same structure:

    C/1CCCCCC\C=C\1
    C/1CCCCCC\C=C1
    C1CCCCCC\C=C\1
    C\1CCCCCC/C=C/1
    C\1CCCCCC/C=C1
    C1CCCCCC/C=C/1

    Notice the slashes on ring closures reverse. So when they're the same it doesn't make sense:

    C/1.C/C=C/1 (RDKit=trans, OE/CA=cis, OB=none)
    C\1.C\C=C\1 (RDKit=trans, OE/CA=cis, OB=none)
    C/1.C\C=C/1 (RDKit=cis, OE/Ca=trans, OB=none)
    C\1.C/C=C\1 (RDKit=cis, OE/CA=trans, OB=none)

    OpenBabel behavious is probably acceptble but it does lose information on read and normally if there's something wrong with the syntax it's likely to be duff input.

    In regards to the first point, \= is not valid in SMILES. Roger's pointed out to me it's a possible extension but we then found /# which completely undermines any argument it's use was intented:

    CC(=O)O[C@@H]1[C@@H](OC(=O)C)[C@](C)(O)[C@H](OC(=O)C)C2=C1C3=C(C(=O)c4c(O)cccc4C3=O)/C/2=N/#N CHEMBL1982727
    

    Curiously these structures apear to be the result of InChI mangaling (getting a connection table from the InChI is nearlly always wrong!).

    We actually used to accept the '\=' '/#' because the parser just use to ignore muliple bond specification but they are invalid and should not be accepted.

    J

     

    Last edit: John May 2016-02-19
  • Nina Jeliazkova

    Nina Jeliazkova - 2016-02-19

    John, is there any workaround you could suggest - or we should just forget about these 3 SMILES ?

     
  • John May

    John May - 2016-02-19

    24 in ChEMBL right?

    Another case is a tetravalent aromatic carbon anion.

    Cc1ccc(OC[CH+]n2c3ccccc3n[c-]2CCNC(=O)C4CCCCC4)cc1 CHEMBL3210799
    

    Again another bug where that carbon had bad valence and should not be aromatic.

    You could pester ChEMBL to fix them. But I think in general these are bad molecules and would chuck them out.

    John

     
  • Egon Willighagen

    What about writing a short note about this (e.g. for a preprint server)? Listing the key problems with the faulty SMILES? That way, people know about the issues. Do you see an automatic way of fixing these SMILES?

     
  • John May

    John May - 2016-02-19
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '\' and closed with '/'. Note - directional bonds ('/','\') are relative.
    [Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccccc4)C[n+]5ccccc5 CHEMBL266971
                                                          ^
    line:[Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccccc4)C[n+]5ccccc5 CHEMBL266971
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '\' and closed with '/'. Note - directional bonds ('/','\') are relative.
    [Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccc(cc4)c5ccncc5)C[n+]6ccc(cc6)c7ccncc7 CHEMBL234962
                                                          ^
    line:[Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccc(cc4)c5ccncc5)C[n+]6ccc(cc6)c7ccncc7 CHEMBL234962
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.
    Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCOCC5)\C(=O)Nc2c1 CHEMBL492646
                                  ^
    line:Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCOCC5)\C(=O)Nc2c1 CHEMBL492646
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '\' and closed with '/'. Note - directional bonds ('/','\') are relative.
    [Br-].[Br-].C[n+]1ccc(CCC(=O)OCCC\C\2=C\Nc3ccccc3\N=C\C(=C/Nc4ccccc4\N=C\2)\CCCOC(=O)CCc5cc[n+](C)cc5)cc1 CHEMBL410188
                                                                            ^
    line:[Br-].[Br-].C[n+]1ccc(CCC(=O)OCCC\C\2=C\Nc3ccccc3\N=C\C(=C/Nc4ccccc4\N=C\2)\CCCOC(=O)CCc5cc[n+](C)cc5)cc1 CHEMBL410188
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.
    Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCCC5)\C(=O)Nc2c1 CHEMBL493883
                                  ^
    line:Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCCC5)\C(=O)Nc2c1 CHEMBL493883
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.
    Cl.Cl.Cc1cccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc12 CHEMBL1315255
                                     ^
    line:Cl.Cl.Cc1cccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc12 CHEMBL1315255
    [beam:canon] 
    error, Multiple bonds specified:
    CC(=O)OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2007982
                            ^
    line:CC(=O)OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2007982
    [beam:canon] 
    error, Multiple bonds specified:
    I.OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2002399
                        ^
    line:I.OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2002399
    [beam:canon] 
    error, Multiple bonds specified:
    CCOC(=O)C(=O)\C(=N/1\=C(C)N(C)c2ccccc12)\C(=O)OCC CHEMBL1973123
                         ^
    line:CCOC(=O)C(=O)\C(=N/1\=C(C)N(C)c2ccccc12)\C(=O)OCC CHEMBL1973123
    [beam:canon] 
    error, Multiple bonds specified:
    CCOC(=O)\C(=N\1/=C(C)Sc2ccccc12)\C(=O)c3ccccc3 CHEMBL1976159
                    ^
    line:CCOC(=O)\C(=N\1/=C(C)Sc2ccccc12)\C(=O)c3ccccc3 CHEMBL1976159
    [beam:canon] 
    error, Multiple bonds specified:
    I.OCCN(C1=NCCN1)\N(\=C/c2ccc(Cl)cc2)\=C\c3ccc(Cl)cc3 CHEMBL1978864
                        ^
    line:I.OCCN(C1=NCCN1)\N(\=C/c2ccc(Cl)cc2)\=C\c3ccc(Cl)cc3 CHEMBL1978864
    [beam:canon] 
    error, Multiple bonds specified:
    CCOC(=O)\C(=N\1/=C(C)N(C)c2ccccc12)\C(=O)c3ccc(cc3)[N+](=O)[O-] CHEMBL2003443
                    ^
    line:CCOC(=O)\C(=N\1/=C(C)N(C)c2ccccc12)\C(=O)c3ccc(cc3)[N+](=O)[O-] CHEMBL2003443
    [beam:canon] 
    error, Multiple bonds specified:
    OCCN(C1=NCCN1)\N(\=C/c2ccccn2)\=C\c3ccccn3 CHEMBL2009803
                      ^
    line:OCCN(C1=NCCN1)\N(\=C/c2ccccn2)\=C\c3ccccn3 CHEMBL2009803
    [beam:canon] 
    error, Multiple bonds specified:
    CC(=O)O[C@@H]1[C@@H](OC(=O)C)[C@](C)(O)[C@H](OC(=O)C)C2=C1C3=C(C(=O)c4c(O)cccc4C3=O)/C/2=N/#N CHEMBL1982727
                                                                                               ^
    line:CC(=O)O[C@@H]1[C@@H](OC(=O)C)[C@](C)(O)[C@H](OC(=O)C)C2=C1C3=C(C(=O)c4c(O)cccc4C3=O)/C/2=N/#N CHEMBL1982727
    [beam:canon] 
    error, Multiple bonds specified:
    I.OCCN(C1=NCCN1)\N(\=C/c2ccccn2)\=C\c3ccccn3 CHEMBL2000062
                        ^
    line:I.OCCN(C1=NCCN1)\N(\=C/c2ccccn2)\=C\c3ccccn3 CHEMBL2000062
    [beam:canon] 
    error, Multiple bonds specified:
    I.CC(=O)OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL1982001
                              ^
    line:I.CC(=O)OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL1982001
    [beam:canon] 
    error, Multiple bonds specified:
    OCCN(C1=NCCN1)\N(\=C/c2ccc(Cl)cc2)\=C\c3ccc(Cl)cc3 CHEMBL2008548
                      ^
    line:OCCN(C1=NCCN1)\N(\=C/c2ccc(Cl)cc2)\=C\c3ccc(Cl)cc3 CHEMBL2008548
    [beam:canon] 
    error, Multiple bonds specified:
    OCCN(C1=NCCN1)\N(\=C/c2ccccc2O)\=C\c3ccccc3O CHEMBL2008572
                      ^
    line:OCCN(C1=NCCN1)\N(\=C/c2ccccc2O)\=C\c3ccccc3O CHEMBL2008572
    [beam:canon] 
    error, Multiple bonds specified:
    CCOC(=O)\C(=N\1/=C(C)N(C)c2ccccc12)\C(=O)c3ccccc3 CHEMBL1994868
                    ^
    line:CCOC(=O)\C(=N\1/=C(C)N(C)c2ccccc12)\C(=O)c3ccccc3 CHEMBL1994868
    [beam:canon] 
    error, Multiple bonds specified:
    OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2009374
                      ^
    line:OCCN(C1=NCCN1)\N(\=C/c2ccc(cc2)C#N)\=C\c3ccc(cc3)C#N CHEMBL2009374
    [beam:canon] 
    error, Multiple bonds specified:
    I.OCCN(C1=NCCN1)\N(\=C/c2ccccc2O)\=C\c3ccccc3O CHEMBL1978411
                        ^
    line:I.OCCN(C1=NCCN1)\N(\=C/c2ccccc2O)\=C\c3ccccc3O CHEMBL1978411
    [beam:canon] 
    error, Multiple bonds specified:
    CCOC(=O)\C(=N/1\=C(C)N(C)c2ccccc12)\C=O CHEMBL1979044
                    ^
    line:CCOC(=O)\C(=N/1\=C(C)N(C)c2ccccc12)\C=O CHEMBL1979044
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.
    Cl.Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc2c1 CHEMBL3216659
                                     ^
    line:Cl.Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc2c1 CHEMBL3216659
    [beam:canon] 
    error, a valid kekul? structure could not be assigned
    line:Cc1ccc(OC[CH+]n2c3ccccc3n[c-]2CCNC(=O)C4CCCCC4)cc1 CHEMBL3210799
    [beam:canon] 
    error, Ring closure bonds did not match. Ring was opened with '/' and closed with '\'. Note - directional bonds ('/','\') are relative.
    Cl.COc1ccc(\C=C/2\N3CCC(CC3)/C/2=N\O)cc1 CHEMBL3211148
                                  ^
    line:Cl.COc1ccc(\C=C/2\N3CCC(CC3)/C/2=N\O)cc1 CHEMBL3211148
    
     
  • Louisa Bellis

    Louisa Bellis - 2016-02-19

    Hi John, you're right, we do use Pipeline PIlot for the canonical smiles. These compounds have been submitted to us via PubChem and they somehow passed through our normal curation check up even with the bad valence.

    You could pester ChEMBL to fix them. But I think in general these are bad molecules and would chuck them out

    I will personally fix your example from - CHEMBL3210799 . We can't always control how the smiles are handled through PP but I can certainly fix the compounds which give bad smiles due to a bad valence. Please can someone send me the ChEMBL IDs for the 3 which caused issue, above? I tried to use the Smiles provided to find them in our DB, but can't find a match. You can send them to chembl-help@ebi.ac.uk

    Just as an aside - I did fix the issue you sent to us last March (CC(=O)O[C@@H]1C@@HC@(O)C@HC2=C1C3=C(C(=O)c4c(O)cccc4C3=O)/C/2=N/#N
    CHEMBL1982727 and the others that were similar) but due to the fact we haven't done a release since last January, you wouldn't have had access to the updated compounds. We are due to release ChEMBL_21 by the end of this month, so hopefully that's one issue fixed for you.

    Please feel free to pass any more issues back to us at chembl-help@ebi.ac.uk

    kind regards

    Louisa (Chemical Curator for ChEMBL)

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2016-02-19

    John, All,

    Just to clarify. While the three I reported are ChEMBL compounds, the SMILES themselves do not come directly from ChEMBL. My report says these SMILES can't be parsed by CDK 1.5.12 , while they are parsed by e.g. OpenBabel. I see this as more generic compatibility problem, which is only partially resolved by fixing SMILES in ChEMBL. We can encounter such SMILES from other sources as well.

    Thanks,
    Nina

     

    Last edit: Nina Jeliazkova 2016-02-19
  • John May

    John May - 2016-02-19

    Thanks Louisa. I couldn't remember when I sent the help message but makes sense if they're not released yet.

    Nina, Garbage in, Garbage out. There is a scrict flag in the parser we could use but then how to we tell the user that there is something very wrong with this input? There are so many bad connection tables out there I've reached the opinion to not live and let live, just look at some PubChem entires! My latest favourite is CID 58150378 from attached image "DSOC". As for Open Babel accepting it, have you tried parsing eMolecules? - literally thousands of broken molecules compared to this handful in ChEMBL.

    17282564

    CN(p1oc2ccc3c(c2c2c(o1)ccc1c2CCCC1)CCCC3)C 17282564 17282563
    
    obabel -:'CN(p1oc2ccc3c(c2c2c(o1)ccc1c2CCCC1)CCCC3)C' -osmi                     
    CN(P1OC2CCC3C(C2C2C(O1)CCC1C2CCCC1)CCCC3)C  
    1 molecule converted
    

    I will also add that we (NextMove) chuck away the ChEMBL generated SMILES and regenerate them from the SDfile. This also fixes problems like MDL's triangle rule (inverted stereo if the center is drawn a certain way). Of course the molfile has it's own problems.

    John

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2016-02-19

    John, personally I do prefer to work with SD files and regenerate SMILES from the connection table, but this is not everybody's preference :) Thanks for the hint about the the strict flag.

    While on this, I would also mention most of the aromatic SMILES from the Open Melting Point dataset 10.6084/m9.figshare.1031637 are considered invalid by CDK 1.5 .

     

    Last edit: Nina Jeliazkova 2016-02-19
  • John May

    John May - 2016-02-19

    Thanks for the hint about the the strict flag.

    I don't think it's expsoed so would need changes. By defauly it's relaxed, I considered these errors to be too bad to ignore.

    While on this, I would also mention most of the aromatic SMILES from the Open Melting Point dataset 10.6084/m9.figshare.1031637 are considered invalid by CDK 1.5 .

    If it's the aromatic nitrogen, they are definately invalid. Alas Daylight depict ceases to exist but poor implementations do not change the notion that a SMILES string has an exact formula.

    You can fix these by turning off kekulize, making the change you see fit (add H, set charge), regenerating and then reading it in again. Daniel's almost talked me into special casing some pontentially unambgiuous ones (e.g. n1cccc1, n1ccc2c1cccc2) but I'm still not convineced.

    I should add those are probably ChemSketch's fault - loads in Wikipedia as it's the recomended drawing tool to use sigh.

    John

     
  • Vedrin Jeliazkov

    I've run a more exhaustive test on:

    ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/latest/chembl_20_chemreps.txt.gz

    The results are as follows: 25 SMILES raise exceptions in CDK as previously indicated by John. All of them are also rejected by PubChem (tested through their REST API). Among those 25 SMILES there are 20 which OpenBabel parses. Among these 20 there are 5 which are also parsed by CIR (apparently running with different version/settings of Cactvs from those used by PubChem). There are 5 SMILES which are rejected by all (!) the tools we've tested (CDK, PubChem, OpenBabel and CIR).

    I'm attaching a file with some more details on the 25 compounds (including ChEMBL IDs) and the testing outcomes (if some of the tools succeeded, this is explicitly noted).

     
  • Nina Jeliazkova

    Nina Jeliazkova - 2016-02-27

    Further report on ChEMBL SMILES parser failing , this time SMILES are generated with CDK 1.5.12,
    SmilesGenerator.absolute()

    The test starts at Line 271
    https://sourceforge.net/p/ambit/code/HEAD/tree/trunk/ambit2-all/ambit2-core/src/test/java/ambit2/core/test/SmilesTest.java

    The structures are here (with ChEMBL ids)
    https://sourceforge.net/p/ambit/code/HEAD/tree/trunk/ambit2-all/ambit2-core/src/test/resources/ambit2/core/chembl/roundtrip7.sdf

    They all fail with Multiple directional bonds on atom X

    Do I understand right there is no option currently to generate unique SMILES with stereo information besides absolute() ?

     

    Last edit: Nina Jeliazkova 2016-02-27
    • John May

      John May - 2016-03-01

      Correct, absolute is the only option at the moment and is backed by the
      InChI (Noel's Universal SMILES paper) so you sometimes see some odd
      results. However this particular case is a limitation of SMILES. I'll
      double check the structures but see the OpenSMILES spec:

      The '/' and '\' marks for cis/trans bonds seem simple on the surface but
      are problematic for complex systems. For example, in a long series of
      conjugated double bonds, changing the configuration of one bond can require
      rewriting dozens of bond symbols.
      More importantly, there is a theoretical flaw with the use of '/' and '\'.
      In a cyclo-ene (name??) ring with an even number of double bonds, it is
      not possible to write a valid SMILES. (Recall that '/' and '\' reverse
      sense if moved from the left to the right of the atom, thus C/1=C/CCCCCCC1 represents
      a cis configuration even though '/' appears twice.)

      Just a quick point there, there's a few things old with your tests:

      • MDLReader should NEVER be used! It's MDLV2000Reader you want. I'm now
        going to deprecate the old one since I've seen this mistake so much, and
        since it also occurs from a CDK expert (yourself) it's just stupid to have
        the old bad one named better then the new one (Note I didn't add the new
        one).
      • Avoid UniversalIsoTester - use 'Pattern.findIdentical' which will be
        faster and also check stereo.
      • In many places, don't need atom typing, don't need hydrogen 'adding'.
        CDKHueckelAromaticityDetector is deprecated, use Aromaticity.

      Regards,
      John W May
      john.wilkinsonmay@gmail.com

      On 27 February 2016 at 07:57, Nina Jeliazkova vedina@users.sf.net wrote:

      Further report on ChEMBL SMILES parser failing , this time SMILES are
      generated with CDK 1.5.12,
      SmilesGenerator.absolute()

      The test starts at Line 271

      https://sourceforge.net/p/ambit/code/HEAD/tree/trunk/ambit2-all/ambit2-core/src/test/java/ambit2/core/test/SmilesTest.java

      The structures a here (with ChEMBL ids)

      https://sourceforge.net/p/ambit/code/HEAD/tree/trunk/ambit2-all/ambit2-core/src/test/resources/ambit2/core/chembl/roundtrip7.sdf

      They all fail with Multiple directional bonds on atom X

      Do I understand right there is no option currently to generate unique
      SMILES with stereo information besides absolute() ?


      Status: open
      Group: cdk-1.6.x
      Created: Fri Feb 19, 2016 07:47 AM UTC by Nina Jeliazkova
      Last Updated: Sat Feb 20, 2016 08:08 PM UTC
      Owner: John May

      CDK 1.5.12 give errors when parsing these three SMILES (all are ChEMBL
      compounds).

      • org.openscience.cdk.exception.InvalidSmilesException: could not
        parse O(CC)C(\C(\C(C(=O)OCC)=O)=N(/=C(C)N1C)\c2c1cccc2)=O, Multiple
        bonds specified:

      The reason seems to be the /= bond specification

      -

      org.openscience.cdk.exception.InvalidSmilesException: could not parse
      OC(=C(\C\1=N\CCCN(CC)c2cccc(C)c2)O)\C\1=N/C(CCN3C(=O)OCC)CC3, Ring
      closure bonds did not match. Ring was opened with '\' and closed with '/'.
      Note - directional bonds ('/','\') are relative.
      -

      org.openscience.cdk.exception.InvalidSmilesException: could not parse
      'Fc1ccccc1N2CCN(CC2)CC\N=C/3\C(=C(/C/3=N\C(CCN4C(=O)OCC)CC4)O)O', Ring
      closure bonds did not match. Ring was opened with '/' and closed with '\'.
      Note - directional bonds ('/','\') are relative.

      OpenBabel and Chemical Identifier Resolver successfully parse all three.

      Any ideas?

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/cdk/bugs/1378/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #1378

  • Nina Jeliazkova

    Nina Jeliazkova - 2016-03-01

    John,

    Thanks. I understand the limitation of absolute(), just was not aware of its behaviour initially.

    Re your points - this file was written years before CDK 1.5. The only new test is the roundtrip , which is the subject of this issue. The rest of the file is irrelevant fot this issue. My bad I did not separate old and new tests.

    • Yes, please do deprecate MDLReader . Or move it to a legacy package. There is lot of legacy code out there (also in Ambit) which will continue to use it otherwise.

    • we usually use AMBIT isomorphism tester , not the CDK one. I didn't know about 'Pattern.findIdentical' , which means we are not yet aware of everything new in CDK 1.5 .

    • I do need atom typing in order to be able reproduce my use case, which has several following steps (not in the test though)

    • You might notice the file uses ambit2.core.helper.CDKHueckelAromaticityDetector, which is a wrapper around Aromaticity.

     

    Last edit: Nina Jeliazkova 2016-03-01
  • John May

    John May - 2016-03-03

    Much better in ChEMBL 21.

    Directional Bonds:

    Cl.Cl.Cc1cccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc12 CHEMBL1315255
    [Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccc(cc4)c5ccncc5)C[n+]6ccc(cc6)c7ccncc7 CHEMBL234962
    [Br-].[Br-].C(C\C\1=C\Nc2ccccc2\N=C\C(=C/Nc3ccccc3\N=C\1)\CCC[n+]4ccccc4)C[n+]5ccccc5 CHEMBL266971
    Cl.COc1ccc(\C=C/2\N3CCC(CC3)/C/2=N\O)cc1 CHEMBL3211148
    Cl.Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCNCC5)\C(=O)Nc2c1 CHEMBL3216659
    [Br-].[Br-].C[n+]1ccc(CCC(=O)OCCC\C\2=C\Nc3ccccc3\N=C\C(=C/Nc4ccccc4\N=C\2)\CCCOC(=O)CCc5cc[n+](C)cc5)cc1 CHEMBL410188
    Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCOCC5)\C(=O)Nc2c1 CHEMBL492646
    Cl.Brc1ccc2\C(=C/3\Nc4ccccc4/C/3=N\OCCN5CCCC5)\C(=O)Nc2c1 CHEMBL493883
    

    Failed to kekulize:

    OC(=O)C(=O)Nc1cccc(c1)c2nnnn2 CHEMBL3188982
    Cc1ccc(OC[CH+]n2c3ccccc3n[c-]2CCNC(=O)C4CCCCC4)cc1 CHEMBL3210799
    

    CHEMBL3188982 is new and quite fun! It's a perfect example of why you shouldn't add hydrogens. Notice that ChemAxon and Open Babel will add a hydrogen to the tetrazole when infact it's a radical: https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL3188982

     
  • Egon Willighagen

    • status: open --> closed
     
  • Egon Willighagen

    I'll close this bug. If new bad ChEMBL SMILES pop up, we can file them as new bugs.

     
  • John May

    John May - 2016-08-10
    • status: closed --> open
     
  • John May

    John May - 2016-08-10

    Reopenning - was getting round to this,this week