Menu

#1380 random behaviour of SMILES parser/generator

cdk-1.6.x
open
John May
None
5
2016-08-15
2016-04-07
No

We've found a weird case, when parsing the same SMILES in a loop and then generating a SMILES of the atomcontainer throws exceptions randomly.

Test here (CDK 1.5.12).

https://github.com/ideaconsult/examples-cdk/blob/master/maven-single-module/src/test/java/net/idea/examples/cdk/maven_single_module/SmilesTest.java#L33

It reads a ChEMBL mol file, generates a SMILES, and then parses this SMILES repeatedly in a loop and trying to generate isomeric SMILES.
Running the test generates output as below (but the exact numbers differ each time the test is run).

Starting SMILES
CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C/5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl
Generated 4 different SMILES;   9 failures
53  CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C/5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl
1   CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C\5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl
22  CC(C)NC(C=1C=CC=2C(C1)=N/C(=C\3/C=C\C(\C=C3)=C/4\C=C\C(=C\5/C=C\C(\C=C5)=C\6/N=C7C=CC(=CC7=N6)C(=N)NC(C)C)\O4)/N2)=N.Cl
15  CC(C)NC(C=1C=CC=2C(C1)=N/C(=C\3/C=C\C(\C=C3)=C/4\C=C\C(=C/5/C=C\C(\C=C5)=C\6/N=C7C=CC(=CC7=N6)C(=N)NC(C)C)\O4)/N2)=N.Cl

The failures are all the same

java.lang.IllegalArgumentException: cannot assign geometric configuration
    at uk.ac.ebi.beam.GraphBuilder.assignDirectionalLabels(GraphBuilder.java:280)
    at uk.ac.ebi.beam.GraphBuilder.build(GraphBuilder.java:435)
    at org.openscience.cdk.smiles.CDKToBeam.toBeamGraph(CDKToBeam.java:165)
    at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:369)
    at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:325)
    at net.idea.examples.cdk.maven_single_module.SmilesTest.weirdtest_smiles(SmilesTest.java:60)

This happens only when trying to use SMILES accounting for stereo (isomeric & absolute).

Discussion

  • John May

    John May - 2016-04-18

    I have (previously) redone the Beam double-bond assignment algorithm that
    should resolve this, or at least make it consistent. It was production
    quality and have been trying to find time to re-write Beam from the ground.
    For now though I'll see if i can get the new algorithm in decent shape.

     
  • John May

    John May - 2016-08-15

    Okay I know what's happening now, fix imminent. It's a curious case actually so will do a blog post, but consider this:

    Input:

    C/C=C(/C)C=CC(/C)=C\C
    https://cdkdepict-openchem.rhcloud.com/depict/bow/svg?smi=C%2FC=C(%2FC)C=CC(%2FC)=C%5CC&abbr=on&suppressh=true&showtitle=false&zoom=1.3&annotate=none
    

    How many stereocenters? 2 or 3? After going through OpenBabel and CDK we start at 2 and get 3.

    [john@orac ~]$ obabel -:'C/C=C(/C)C=CC(/C)=C\C' -osmi
    C/C=C(/C)\C=C/C(=C\C)/C
    https://cdkdepict-openchem.rhcloud.com/depict/bow/svg?smi=C%2FC=C(%2FC)%5CC=C%2FC(=C%5CC)%2FC&abbr=on&suppressh=true&showtitle=false&zoom=1.3&annotate=none
    

    Following Noel's advice (Universal SMILES paper) I faithyl write / and \ on all bonds connected to a
    cis-trans stereocenter. However this can actually and stereochemistry where this is none.

     

    Last edit: John May 2016-08-15