From: Joerg K. W. <we...@in...> - 2004-12-06 16:32:52
|
Hi Andreas, morgan.renumber(tMol); returns the renumbered molecule (no salts!). In detail: - Applying a modified BFS search which assigns the initial values - The rest of the algorithm uses this numbers to get an unique=20 renumbering for this molecule. If renumbering ties occur, there are=20 several tie resolvers using different atom labels. - Finally, the returned molecule should be uniquely renumbered, as far=20 as the morgan algorithm and the tie resolvers can grant this. If you put then this molecule into a SMILES generator, it should always=20 return the same SMILES, even, if the original molecules used another=20 numbering scheme. Kind regards, Joerg > If I use >=20 > ------------------CODE BEGIN--------------------- > // create mol object out of this smiles string > JOESmilesParser.smiToMol(mol, smiles, smiles); > =09 > // clone the molecule > // and recalculate > // numbers > JOEMol tMol =3D (JOEMol) mol.clone(false); > morgan.calculate(tMol); > JOEMol rMol =3D morgan.renumber(tMol); // (*) > JOEMol rMol =3D tMol; >=20 > // create mol to smiles converter=20 > m2s.init(); >=20 > // do some correction > m2s.correctAromaticAmineCharge(rMol); >=20 > // String buffer to hold the canonified string > m2s.createSmiString(rMol, smilesb); > return smilesb.toString(); > ------------------CODE BEGIN--------------------- >=20 > I get a NULL pointer exception at (*) with the following: > Exception in thread "main" java.lang.NullPointerException > at joelib.algo.morgan.Morgan.getBFS(Morgan.java:329) > at joelib.algo.morgan.Morgan.renumber(Morgan.java:261) > at SmilesCanonifier.canonify(SmilesCanonifier.java:35) > at SmilesCanonifier.main(SmilesCanonifier.java:52) > =09 > Without (*), however, it works quite well, too. > Only in some molecules it switches from (for example) > 'C12CC3CC(CC(C3)C2)C1' to > 'C12CC3CC(CC(C3)C1)C2', > but that's not so problematic. > Also, the missing salt support is no acute problem. > What does (*) do, exactly? >=20 > Greetings, > Andreas > =09 > On Tue, Nov 30, 2004 at 08:17:37AM +0100, Joerg K. Wegner wrote: >=20 >>Hi Andreas, >> >>here is the code fragment used by joelib.io.types.Smiles. Please note=20 >>that the actual morgan algorithm can not deal with salts. A solution=20 >>would be to get all contiguous fragments in decreasing order (method=20 >>mol.contiguousFragments(List)) and apply the morgan algorithm to each=20 >>fragment. Then you must simply connect those fragments in the SMILES=20 >>String with frag1smiles.frag2smiles.frag3smiles >> >>Mmhh, its not a big deal but i'm at the moment heavily busy with the=20 >>JOELib2 refactoring. Anyway, here is the code: >> >>if (doCanonical) >>{ >> JOEMol tMol =3D (JOEMol) mol.clone(false); >> morgan.calculate(tMol); >> JOEMol rMol =3D morgan.renumber(tMol); >> m2s.correctAromaticAmineCharge(rMol); >> m2s.createSmiString(rMol, smilesb); >>} >>else >>{ >> m2s.correctAromaticAmineCharge(mol); >> m2s.createSmiString(mol, smilesb); >>} >> >>MfG, Joerg >> >> >>>with the following code I read in a SMILES, build a molecule out of=20 >>>it and reencode it as a string: >>> >>>----------------- BEGIN CODE -------------------------------- >>> >>>import joelib.molecule.*; >>>import joelib.smiles.*; >>> >>>public class SmilesCanonifier { >>> =20 >>> public String canonify(String smiles) { >>> JOEMol mol=3Dnew JOEMol(); >>> JOEMol2Smi m2s =3D new JOEMol2Smi(); m2s.init(); >>> StringBuffer smilesb =3D new StringBuffer(1000); >>> >>> // create mol object out of this smiles string >>> JOESmilesParser.smiToMol(mol, smiles, smiles); >>> >>> // create mol to smiles converter >>> m2s.init(); >>> // do some correction >>> m2s.correctAromaticAmineCharge(mol); >>> >>> // String buffer to hold the canonified string >>> m2s.createSmiString(mol, smilesb); >>> return smilesb.toString(); >>> } >>> >>> public static void main(String args[]) { >>> String smiles =3D args[0]; >>> System.out.println(smiles); >>> SmilesCanonifier sc =3D new SmilesCanonifier(); >>> System.out.println(sc.canonify(smiles)); >>> } >>>} >>> >>>----------------- END CODE ---------------------------------- >>> >>>When I try this on a SMILES s0 I get an equivalent output smiles s1. >>>When I try it on s1, I get an equivalent output smiles s2, which is=20 >>>different from s1. >>>When I try it on s2, I get an equivalent output smiles s3, which is=20 >>>the same as s1, s3=3Ds1. >>>When I try it on s3=3Ds1, I get an equivalent output smiles s4, which=20 >>>is the same as s2, s4=3Ds2. >>> >>>The process repeats from now on, switching between two SMILES=20 >>>versions. >>>Is there no canonical version? Otherwise, how can I get one? >>> >>>Greetings, >>>Andi >>> >>> >>> >>>------------------------------------------------------- >>>SF email is sponsored by - The IT Product Guide >>>Read honest & candid reviews on hundreds of IT Products from real=20 >>>users. >>>Discover which products truly live up to the hype. Start reading=20 >>>now. http://productguide.itmanagersjournal.com/ >>>_______________________________________________ >>>Joelib-help mailing list >>>Joe...@li... >>>https://lists.sourceforge.net/lists/listinfo/joelib-help >>> >> >> >>--=20 >>Dipl. Chem. Joerg K. Wegner >>Center of Bioinformatics Tuebingen (ZBIT) >>Department of Computer Architecture >>Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany >>Phone: (+49/0) 7071 29 78970 >>Fax: (+49/0) 7071 29 5091 >>E-Mail: mailto:we...@in... >>WWW: http://www-ra.informatik.uni-tuebingen.de >>-- >>Never mistake motion for action. >> (E. Hemingway) >> >>Never mistake action for meaningful action. >> (Hugo Kubinyi,2004) >> >> >> >>------------------------------------------------------- >>SF email is sponsored by - The IT Product Guide >>Read honest & candid reviews on hundreds of IT Products from real=20 >>users. >>Discover which products truly live up to the hype. Start reading now.=20 >>http://productguide.itmanagersjournal.com/ >>_______________________________________________ >>Joelib-help mailing list >>Joe...@li... >>https://lists.sourceforge.net/lists/listinfo/joelib-help >=20 >=20 > Viele Gr=FC=DFe, > Andi --=20 Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |
From: Xinshan K. <jas...@ya...> - 2004-12-15 18:02:53
|
Dear Joerg, I am a new user of the package, and I have a kind of similar problem to Andreas'. Thanks for the info. you have provided in this thread. One remaining question I have regarding to Morgan routine is: will each molecule end up the same unique numbering no matter what initial values are assigned? The reason is that I did a test to convert between SDF and SMILES back and forth a couple times, and every time there is a portion of molecules in the collection show different SMILES strings compared to the previous rounds. Any information is appreciated. Thanks, Jason ----------------------------------------- >Hi Andreas, > > morgan.renumber(tMol); returns the renumbered molecule (no salts!). > > In detail: > - Applying a modified BFS search which assigns the initial values > - The rest of the algorithm uses this numbers to get an unique > renumbering for this molecule. If renumbering ties occur, there are > several tie resolvers using different atom labels. > - Finally, the returned molecule should be uniquely renumbered, as far > as the morgan algorithm and the tie resolvers can grant this. > > If you put then this molecule into a SMILES generator, it should always > return the same SMILES, even, if the original molecules used another > numbering scheme. > > Kind regards, Joerg __________________________________ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo |
From: Joerg K. W. <we...@in...> - 2004-12-16 14:32:39
|
Hi Xinshan, > I am a new user of the package, and I have a kind of > similar problem to Andreas'. Thanks for the info. you > have provided in this thread. One remaining question > I have regarding to Morgan routine is: will each > molecule end up the same unique numbering no matter > what initial values are assigned? The reason is that > I did a test to convert between SDF and SMILES back > and forth a couple times, and every time there is a > portion of molecules in the collection show different > SMILES strings compared to the previous rounds. Any > information is appreciated. There are still ties, and there are also publications using other canonizations. Of course Morgan is not perfect. If you are interested you can send a support request to the JOELib project. Please give details for the structures used (if possible), otherwise such things can get never fixed. Please note that canonization has at least polynominal runtime and this is not an easy task for molecular graphs. The best solution would be to create a test data set with uniqueSMILES -> index scrambling -> uniqueSMILES, then we can measure the improvement rate for better tie resolvers, but at the moment i've no time for such things. Any helping hand is highly welcome. Kind regards, Joerg -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. (E. Hemingway) Never mistake action for meaningful action. (Hugo Kubinyi,2004) |