|
From: John M. <joh...@gm...> - 2013-11-09 18:21:31
|
Hi Ed,
I've rewritten the parser and generator, on master I get:
mol1 - 98 & SMILES IS O=C(O)C1NC(=O)C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(NC(=O)C(N)c5ccc(O)c(Oc6cc(O)cc4c6)c5)Cc7ccc(Oc8cc3cc(Oc9ccc(cc9Cl)C2OC%10OC(CO)C(O)C(O)C%10NC(=O)C)c8O)cc7)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%121
mol2 - 79 & SMILES IS O=C(NCc1nc[nH]c1)CCCC2(c3ccccc3)C45c6c-7c8c9c%10c%11c%12c%13c%14c%15c%16c%12c%17c%18c%16c%19c%20c%15c%21c%14c%22c%23c%13c%11c%24c%25c%23c%26c%22c%27c%21c%28c%20c%29c%19c%30c%18c(c8c%17%10)c6c%30c%31c%29c%32c%28c%27c%33c%26c%34c%25c(c%249)c7c5c%34c%33c%32C%3142
mol3 - 77 & SMILES IS O=C(NC(=N)N(C)C)C1(C(=O)NC(=N)N(C)C)C23c4c-5c6c7c8c9c%10c%11c%12c%13c%14c%10c%15c%16c%14c%17c%18c%13c%19c%12c%20c%21c%11c9c%22c%23c%21c%24c%20c%25c%19c%26c%18c%27c%17c%28c%16c(c6c%158)c4c%28c%29c%27c%30c%26c%25c%31c%24c%32c%23c(c%227)c5c3c%32c%31c%30C%2912
mol4 - 98 & SMILES IS O=C(O)C1NC(=O)C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(NC(=O)C(N)c5ccc(O)c(Oc6cc(O)cc4c6)c5)Cc7ccc(Oc8cc3cc(Oc9ccc(cc9Cl)C2OC%10OC(CO)C(O)C(O)C%10NC(=O)C)c8O)cc7)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%121
Is that resolved?
When generating SMILES the defat current scheme uses unique ring numbers as much as possible and then reuses. If it hits a point where it ran out of ring numbers it will throw an exception (e.g. fullerene C720). I do have a config which allows reuse of ring numbers as much as possible (i.e. less %) but that isn’t the default as OpenSMILES recommends unique numbers.
J
On 9 Nov 2013, at 17:45, Duece99 <du...@us...> wrote:
>
> [bugs:#1316] Error with SMILES parsing with lots of rings
>
> Status: open
> Created: Sat Nov 09, 2013 05:45 PM UTC by Duece99
> Last Updated: Sat Nov 09, 2013 05:45 PM UTC
> Owner: nobody
>
> Hello,
>
> Observe the four SMILES strings below (noting that the 1st and the 4th are supposed to represent the same molecule)...
>
> import org.openscience.cdk.DefaultChemObjectBuilder;
> import org.openscience.cdk.exception.InvalidSmilesException;
> import org.openscience.cdk.interfaces.IAtomContainer;
> import org.openscience.cdk.smiles.SmilesGenerator;
> import org.openscience.cdk.smiles.SmilesParser;
>
> public class SMILESGeneratorBug {
>
> public static void main( String[] argv ) {
>
> String s1 = "CC(=O)NC1C(O)C(O)C(CO)OC1OC2C3NC(=O)C(NC(=O)C4NC(=O)C5NC(=O)C(Cc6ccc(Oc7cc4cc(Oc8ccc2cc8Cl)c7O)cc6)NC(=O)C(N)c9ccc(O)c(Oc%10cc(O)cc5c%10)c9)c%11ccc(O)c(c%11)-c%12c(O)cc(O)cc%12C(NC3=O)C(=O)O";
> String s2 = "O=C(CCCC1(c2ccccc2)C34c5c6-c7c4c8c9c%10c%11c%12c(c%13c5c%14c%15c6c%16c%17c7c%18c8c%19c9c%20c%11c%21c%22c%12c%23c%13c%14c%24c%25c%15c%16c%26c%27c%17c%18c%28c%19c%29c%20c%21c%30c%31c%22c%23c%24c%32c%25c%26c%33c%27c%28c%29c%30c%33c%31%32)C%1013)NCc%34c[nH]cn%34";
> String s3 = "CN(C)C(=N)NC(=O)C1(C(=O)NC(=N)N(C)C)C23c4c5-c6c3c7c8c9c%10c%11c(c%12c4c%13c%14c5c%15c%16c6c%17c7c%18c8c%19c%10c%20c%21c%11c%22c%12c%13c%23c%24c%14c%15c%25c%26c%16c%17c%27c%18c%28c%19c%20c%29c%30c%21c%22c%23c%31c%24c%25c%32c%26c%27c%28c%29c%32c%30%31)C912";
> String s4 = "CC(=O)NC1C(O)C(O)C(CO)OC1OC1C2NC(=O)C(NC(=O)C3NC(=O)C4NC(=O)C(Cc5ccc(Oc6cc3cc(Oc3ccc1cc3Cl)c6O)cc5)NC(=O)C(N)c1ccc(O)c(Oc3cc(O)cc4c3)c1)c1ccc(O)c(c1)-c1c(O)cc(O)cc1C(NC2=O)C(O)=O";
> // s1 and s4 represent the same molecule, though s4 is special as there're no % symbols used for ring notation
>
> SmilesParser sp = new SmilesParser( DefaultChemObjectBuilder.getInstance() );
> SmilesGenerator smiG = new SmilesGenerator(true);
>
> IAtomContainer mol1, mol2, mol3, mol4;
> try {
> mol1 = sp.parseSmiles(s1);
> mol2 = sp.parseSmiles(s2);
> mol3 = sp.parseSmiles(s3);
> mol4 = sp.parseSmiles(s4);
>
> System.out.println( "mol1 - " + mol1.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol1) ); // no SMILES reported
> System.out.println( "mol2 - " + mol2.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol2) ); // no SMILES reported
> System.out.println( "mol3 - " + mol3.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol3) ); // no SMILES reported
> System.out.println( "mol4 - " + mol4.getAtomCount() + " & SMILES IS " + smiG.createSMILES(mol4) ); // SMILES IS reported!
> } catch (InvalidSmilesException e) {
> e.printStackTrace();
> }
>
> }
> }
> Running this code yields no SMILES generated from the SmilesGenerator object for the first 3 molecules (no errors AFAIK), yet SMILES is yielded for the fourth!
>
> Note that the 4th molecule has its ring notation recycled - there're no "%" symbols in its SMILES string. Unsure if that's the cause of the problem, but I assume its a bug or feature-lack in the SmilesParser class.
>
> Any input?
>
> Ed.
>
> Sent from sourceforge.net because cdk...@li... is subscribed to https://sourceforge.net/p/cdk/bugs/
>
> To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cdk/admin/bugs/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
>
> ------------------------------------------------------------------------------
> November Webinars for C, C++, Fortran Developers
> Accelerate application performance with scalable programming models. Explore
> techniques for threading, error checking, porting, and tuning. Get the most
> from the latest Intel processors and coprocessors. See abstracts and register
> http://pubads.g.doubleclick.net/gampad/clk?id=60136231&iu=/4140/ostg.clktrk_______________________________________________
> Cdk-bugs mailing list
> Cdk...@li...
> https://lists.sourceforge.net/lists/listinfo/cdk-bugs
|