Re: [Rdkit-discuss] MolToSmiles
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Maciek W. <ma...@wo...> - 2016-12-19 08:44:32
|
Hi Jean-Marc and others, There is also CanonicalRankAtoms [ http://www.rdkit.org/Python_Docs/rdkit.Chem.rdmolfiles-module.html#CanonicalRankAtoms] which seams to be forgotten. ---- Pozdrawiam, | Best regards, Maciek Wójcikowski ma...@wo... 2016-12-18 23:14 GMT+01:00 Jean-Marc Nuzillard <jm....@un...>: > Thank you Andrew, Brian and David for your answers. > > mol.GetProp("_smilesAtomOutputOrder") does the job. > I also expected a.GetProp("molAtomMapNumber") could do it for each atom a. > > All the best, > > Jean-Marc > > Le 18/12/2016 à 19:04, Andrew Dalke a écrit : > > On Dec 18, 2016, at 6:32 PM, Brian Kelley wrote: > >>>>> m.GetProp("_smilesAtomOutputOrder") > >> '[3,2,1,0,]' > >> > >> Note that this returns the list as a string which is sub-optimal. > GetPropsAsDict will convert these to proper python objects, however, this > is considered a private member so you need to return these as well: > >> > >>>>> list(m.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"]) > >> [3, 2, 1, 0] > > For fun, here are a few timing numbers: > > > > # Common setup > > from rdkit import Chem > > mol = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1") > > Chem.MolToSmiles(mol)' > > import json > > import ujson # third-party JSON decoder > > import re > > integer_pat = re.compile("[0-9]+") > > > > > > # Get the string (give a lower bound) > > mol.GetProp("_smilesAtomOutputOrder")' > > 10000 loops, best of 3: 31.3 usec per loop > > > > > > Here are variations for how to get that information as a list of > integers: > > > > # Using Python's "eval()" to decode the list (this is generally UNSAFE!) > > eval(mol.GetProp("_smilesAtomOutputOrder"))' > > 10000 loops, best of 3: 157 usec per loop > > > > # Use the built-in json module (need to remove the terminal ",") > > json.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]")' > > 10000 loops, best of 3: 66.5 usec per loop > > > > # Use the third-party "ujson" package, which is faster than json. > > ujson.loads(mol.GetProp("_smilesAtomOutputOrder")[:-2]+"]") > > 10000 loops, best of 3: 41.2 usec per loop > > > > ("cjson" takes 49.7 usec per loop) > > > > # Use the properties dictionary > > mol.GetPropsAsDict(True,True)["_smilesAtomOutputOrder"] > > 1000 loops, best of 3: 462 usec per loop > > > > # Parse it more directly > > map(int, integer_pat.findall(mol.GetProp("_smilesAtomOutputOrder"))) > > 10000 loops, best of 3: 89 usec per loop > > > > > > Andrew > > da...@da... > > > > > > > > ------------------------------------------------------------ > ------------------ > > Check out the vibrant tech community on one of the world's most > > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > > _______________________________________________ > > Rdkit-discuss mailing list > > Rdk...@li... > > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > > > > -- > Jean-Marc Nuzillard > Institut de Chimie Moléculaire de Reims > CNRS UMR 7312 > Moulin de la Housse > CPCBAI, Bâtiment 18 > BP 1039 > 51687 REIMS Cedex 2 > France > > Tel : 03 26 91 82 10 > Fax : 03 26 91 31 66 > http://www.univ-reims.fr/ICMR > > http://www.univ-reims.fr/LSD/ > http://www.univ-reims.fr/LSD/JmnSoft/ > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, SlashDot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |