Thread: [Rdkit-discuss] identify isomers using canonical SMILES
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Cheng W. <che...@ho...> - 2009-02-21 22:54:38
|
Hi, I am a newcomer to RDKit and I just start to read the getting start file and the manual. What I want to do is to provide a list of species with SMILES and then have RDKit identify all the isomers among the species. The species includes hydrocarbons, aromatics, to name a few. I would like to know whether this is doable in RDKit (my initial impression is yes). If yes, could someone give me some hints on how to do this task or point me to the right place in the user manual? Thanks, Cheng _________________________________________________________________ Windows Live™ Hotmail®…more than just e-mail. http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t2_hm_justgotbetter_howitworks_022009 |
From: Greg L. <gre...@gm...> - 2009-02-22 11:54:52
|
Dear Cheng, On Sat, Feb 21, 2009 at 11:54 PM, Cheng Wang <che...@ho...> wrote: > > I am a newcomer to RDKit and I just start to read the getting start file and > the manual. > What I want to do is to provide a list of species with SMILES and then have > RDKit identify > all the isomers among the species. The species includes hydrocarbons, > aromatics, to name > a few. I would like to know whether this is doable in RDKit (my initial > impression is yes). > If yes, could someone give me some hints on how to do this task or point me > to the right > place in the user manual? I'm not completely sure what you mean by "all the isomers", can you provide a little bit more specific information about what you'd like to do? -greg |
From: Cheng W. <che...@ho...> - 2009-02-22 20:33:49
|
Dear Greg, Sorry that I am not very clear on my first e-mail. Here is what I want to achieve. Nowadays we use some large detailed mechanisms to study combustion behavior. These mechanisms normally involve hundreds (sometimes over 1000) species including a lot of large hydrocarbons (more than 6 Cs). Because some of these mechanisms are generated semi-automatically, they include reaction pathways of many isomers. So one way to make the simulation run faster is to reduce the mechanism by creating pseudo-species representing all isomers of the same species family. Then the reaction pathways involving these isomers are combined through lumping process. My plan is to use RDKit to identify the isomers among the species. Thanks, Cheng _________________________________________________________________ Access your email online and on the go with Windows Live Hotmail. http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_AE_Access_022009 |
From: Greg L. <gre...@gm...> - 2009-02-23 15:49:05
|
Dear Cheng, On Sun, Feb 22, 2009 at 9:33 PM, Cheng Wang <che...@ho...> wrote: > > Nowadays we use some large detailed mechanisms to study combustion > behavior. These > mechanisms normally involve hundreds (sometimes over 1000) species including > a lot of > large hydrocarbons (more than 6 Cs). Because some of these mechanisms are > generated > semi-automatically, they include reaction pathways of many isomers. So one > way to make > the simulation run faster is to reduce the mechanism by creating > pseudo-species > representing all isomers of the same species family. Then the reaction > pathways involving > these isomers are combined through lumping process. My plan is to use RDKit > to identify > the isomers among the species. Ok, I think I have it now. You have a set of molecules and you would like to group together ones that have the same chemical formula. Somehow it has happened that the RDKit does not have a function to generate the chemical formula for a molecule, so one would need to write it from scratch. Here's a simple (and relatively untested) way of doing this: #---------------------------- import collections import Chem def ChemicalFormula(mol): """ A molecules' chemical formula >>> ChemicalFormula(Chem.MolFromSmiles('CC')) 'C2H6' >>> ChemicalFormula(Chem.MolFromSmiles('C(=O)O')) 'CH2O2' >>> ChemicalFormula(Chem.MolFromSmiles('C(=O)[O-]')) 'CHO2' >>> ChemicalFormula(Chem.MolFromSmiles('C(=O)')) 'CH2O' """ cnts=collections.defaultdict(int) for atom in mol.GetAtoms(): symb = atom.GetSymbol() hs = atom.GetTotalNumHs() cnts[symb]+=1 cnts['H']+=hs ks = cnts.keys() ks.sort() res='' for k in ks: res+=k if cnts[k]>1: res+=str(cnts[k]) return res #---------------------------- For your purposes, this could be simplified a bit since you don't really need the result as a string, but assuming I understood what you want to do correctly, this should get you started. Regards, -greg |
From: Cheng W. <che...@ho...> - 2009-02-23 18:18:05
|
Dear Greg, Thanks for the suggestions. I will try it out. Sincerely, Cheng _________________________________________________________________ It’s the same Hotmail®. If by “same” you mean up to 70% faster. http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_HM_AE_Same_022009 |
From: Andrew D. <da...@da...> - 2009-02-23 18:35:13
|
On Feb 23, 2009, at 4:48 PM, Greg Landrum wrote: > Somehow it has happened that the RDKit does not have a function to > generate the chemical formula for a molecule, so one would need to > write it from scratch. Here's a simple (and relatively untested) way > of doing this: Ideally it would generate the Hill formula, > ks = cnts.keys() > ks.sort() > res='' > for k in ks: > res+=k > if cnts[k]>1: > res+=str(cnts[k]) would be more like: ks = cnts.keys() # Alphabetize everything ks.sort() # Put into Hill order (C then H then everything else) if "C" in cnts: ks.remove("C") ks.insert(0, "C") if "H" in cnts: ks.remove("H") ks.insert(1, "H") ... There are other solutions which are more efficient for large N, but there's only about 100 elements in the world, and only a handful in most compounds. Andrew da...@da... |