Hi Ed,

I've rewritten the parser and generator, on master I get:

mol1 - 98 & SMILES IS O=C(O)C1NC(=O)C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(NC(=O)C(N)c5ccc(O)c(Oc6cc(O)cc4c6)c5)Cc7ccc(Oc8cc3cc(Oc9ccc(cc9Cl)C2OC%10OC(CO)C(O)C(O)C%10NC(=O)C)c8O)cc7)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%121
mol2 - 79 & SMILES IS O=C(NCc1nc[nH]c1)CCCC2(c3ccccc3)C45c6c-7c8c9c%10c%11c%12c%13c%14c%15c%16c%12c%17c%18c%16c%19c%20c%15c%21c%14c%22c%23c%13c%11c%24c%25c%23c%26c%22c%27c%21c%28c%20c%29c%19c%30c%18c(c8c%17%10)c6c%30c%31c%29c%32c%28c%27c%33c%26c%34c%25c(c%249)c7c5c%34c%33c%32C%3142
mol3 - 77 & SMILES IS O=C(NC(=N)N(C)C)C1(C(=O)NC(=N)N(C)C)C23c4c-5c6c7c8c9c%10c%11c%12c%13c%14c%10c%15c%16c%14c%17c%18c%13c%19c%12c%20c%21c%11c9c%22c%23c%21c%24c%20c%25c%19c%26c%18c%27c%17c%28c%16c(c6c%158)c4c%28c%29c%27c%30c%26c%25c%31c%24c%32c%23c(c%227)c5c3c%32c%31c%30C%2912
mol4 - 98 & SMILES IS O=C(O)C1NC(=O)C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(NC(=O)C(N)c5ccc(O)c(Oc6cc(O)cc4c6)c5)Cc7ccc(Oc8cc3cc(Oc9ccc(cc9Cl)C2OC%10OC(CO)C(O)C(O)C%10NC(=O)C)c8O)cc7)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%121

Is that resolved?

When generating SMILES the defat current scheme uses unique ring numbers as much as possible and then reuses. If it hits a point where it ran out of ring numbers it will throw an exception (e.g. fullerene C720). I do have a config which allows reuse of ring numbers as much as possible (i.e. less %) but that isnít the default as OpenSMILES recommends unique numbers. 


On 9 Nov 2013, at 17:45, Duece99 <duece99@users.sf.net> wrote:

[bugs:#1316] Error with SMILES parsing with lots of rings

Status: open
Created: Sat Nov 09, 2013 05:45 PM UTC by Duece99
Last Updated: Sat Nov 09, 2013 05:45 PM UTC
Owner: nobody


Observe the four SMILES strings below (noting that the 1st and the 4th are supposed to represent the same molecule)...

import org.openscience.cdk.DefaultChemObjectBuilder;
import org.openscience.cdk.exception.InvalidSmilesException;
import org.openscience.cdk.interfaces.IAtomContainer;
import org.openscience.cdk.smiles.SmilesGenerator;
import org.openscience.cdk.smiles.SmilesParser;

public class SMILESGeneratorBug {

    public static void main( String[] argv ) {

        String s1 = "CC(=O)NC1C(O)C(O)C(CO)OC1OC2C3NC(=O)C(NC(=O)C4NC(=O)C5NC(=O)C(Cc6ccc(Oc7cc4cc(Oc8ccc2cc8Cl)c7O)cc6)NC(=O)C(N)c9ccc(O)c(Oc%10cc(O)cc5c%10)c9)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%12C(NC3=O)C(=O)O";
        String s2 = "O=C(CCCC1(c2ccccc2)C34c5c6-c7c4c8c9c%10c%11c%12c(c%13c5c%14c%15c6c%16c%17c7c%18c8c%19c9c%20c%11c%21c%22c%12c%23c%13c%14c%24c%25c%15c%16c%26c%27c%17c%18c%28c%19c%29c%20c%21c%30c%31c%22c%23c%24c%32c%25c%26c%33c%27c%28c%29c%30c%33c%31%32)C%1013)NCc%34c[nH]cn%34";
        String s3 = "CN(C)C(=N)NC(=O)C1(C(=O)NC(=N)N(C)C)C23c4c5-c6c3c7c8c9c%10c%11c(c%12c4c%13c%14c5c%15c%16c6c%17c7c%18c8c%19c%10c%20c%21c%11c%22c%12c%13c%23c%24c%14c%15c%25c%26c%16c%17c%27c%18c%28c%19c%20c%29c%30c%21c%22c%23c%31c%24c%25c%32c%26c%27c%28c%29c%32c%30%31)C912";
        String s4 = "CC(=O)NC1C(O)C(O)C(CO)OC1OC1C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(Cc5ccc(Oc6cc3cc(Oc3ccc1cc3Cl)c6O)cc5)NC(=O)C(N)c1ccc(O)c(Oc3cc(O)cc4c3)c1)c1ccc(O)c(c1)-c1c(O)cc(O)cc1C(NC2=O)C(O)=O";
        // s1 and s4 represent the same molecule, though s4 is special as there're no % symbols used for ring notation

        SmilesParser sp = new SmilesParser( DefaultChemObjectBuilder.getInstance() );
        SmilesGenerator smiG = new SmilesGenerator(true);

        IAtomContainer mol1, mol2, mol3, mol4;
        try {
            mol1 = sp.parseSmiles(s1);
            mol2 = sp.parseSmiles(s2);
            mol3 = sp.parseSmiles(s3);
            mol4 = sp.parseSmiles(s4);

            System.out.println( "mol1 - " + mol1.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol1) );  // no SMILES reported
            System.out.println( "mol2 - " + mol2.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol2) );  // no SMILES reported
            System.out.println( "mol3 - " + mol3.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol3) );  // no SMILES reported
            System.out.println( "mol4 - " + mol4.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol4) );  // SMILES IS reported!
        } catch (InvalidSmilesException e) {


Running this code yields no SMILES generated from the SmilesGenerator object for the first 3 molecules (no errors AFAIK), yet SMILES is yielded for the fourth!

Note that the 4th molecule has its ring notation recycled - there're no "%" symbols in its SMILES string. Unsure if that's the cause of the problem, but I assume its a bug or feature-lack in the SmilesParser class.

Any input?


Sent from sourceforge.net because cdk-bugs@lists.sf.net is subscribed to https://sourceforge.net/p/cdk/bugs/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cdk/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.

November Webinars for C, C++, Fortran Developers
Accelerate application performance with scalable programming models. Explore
techniques for threading, error checking, porting, and tuning. Get the most
from the latest Intel processors and coprocessors. See abstracts and register
Cdk-bugs mailing list