We've found a weird case, when parsing the same SMILES in a loop and then generating a SMILES of the atomcontainer throws exceptions randomly.
Test here (CDK 1.5.12).
It reads a ChEMBL mol file, generates a SMILES, and then parses this SMILES repeatedly in a loop and trying to generate isomeric SMILES.
Running the test generates output as below (but the exact numbers differ each time the test is run).
Starting SMILES CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C/5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl Generated 4 different SMILES; 9 failures 53 CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C/5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl 1 CC(C)NC(C=1C=CC=2C(C1)=N\C(=C/3\C=C/C(/C=C3)=C\4/C=C/C(=C\5\C=C/C(/C=C5)=C/6\N=C7C=CC(=CC7=N6)C(=N)NC(C)C)/O4)\N2)=N.Cl 22 CC(C)NC(C=1C=CC=2C(C1)=N/C(=C\3/C=C\C(\C=C3)=C/4\C=C\C(=C\5/C=C\C(\C=C5)=C\6/N=C7C=CC(=CC7=N6)C(=N)NC(C)C)\O4)/N2)=N.Cl 15 CC(C)NC(C=1C=CC=2C(C1)=N/C(=C\3/C=C\C(\C=C3)=C/4\C=C\C(=C/5/C=C\C(\C=C5)=C\6/N=C7C=CC(=CC7=N6)C(=N)NC(C)C)\O4)/N2)=N.Cl
The failures are all the same
java.lang.IllegalArgumentException: cannot assign geometric configuration at uk.ac.ebi.beam.GraphBuilder.assignDirectionalLabels(GraphBuilder.java:280) at uk.ac.ebi.beam.GraphBuilder.build(GraphBuilder.java:435) at org.openscience.cdk.smiles.CDKToBeam.toBeamGraph(CDKToBeam.java:165) at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:369) at org.openscience.cdk.smiles.SmilesGenerator.create(SmilesGenerator.java:325) at net.idea.examples.cdk.maven_single_module.SmilesTest.weirdtest_smiles(SmilesTest.java:60)
This happens only when trying to use SMILES accounting for stereo (isomeric & absolute).
I have (previously) redone the Beam double-bond assignment algorithm that
should resolve this, or at least make it consistent. It was production
quality and have been trying to find time to re-write Beam from the ground.
For now though I'll see if i can get the new algorithm in decent shape.
Okay I know what's happening now, fix imminent. It's a curious case actually so will do a blog post, but consider this:
Input:
How many stereocenters? 2 or 3? After going through OpenBabel and CDK we start at 2 and get 3.
Following Noel's advice (Universal SMILES paper) I faithyl write / and \ on all bonds connected to a
cis-trans stereocenter. However this can actually and stereochemistry where this is none.
Last edit: John May 2016-08-15