Re: [Rdkit-discuss] descriptor calculation - fragment counts & Co.
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: <Pau...@me...> - 2012-11-12 07:51:14
|
> > I'm wondering about the total number of accessible descriptors in RDKit: > > > > This is is my code: > > " > > import sys > > from rdkit import Chem > > from rdkit.Chem import Descriptors > > from rdkit.ML.Descriptors import MoleculeDescriptors > > > > file_in = sys.argv[1] > > file_out = file_in+".descr.sdf" > > ms = [x for x in Chem.SDMolSupplier(file_in) if x is not None] > > ms_wr = Chem.SDWriter(file_out) > > > > nms=[x[0] for x in Descriptors._descList] > > #nms.remove('MolecularFormula') > > print len(Descriptors._descList) > > > > > > calc = MoleculeDescriptors.MolecularDescriptorCalculator(nms) > > > > for i in range(len(ms)): > > descrs = calc.CalcDescriptors(ms[i]) > > for x in range(len(descrs)): > > ms[i].SetProp(str(nms[x]),str(descrs[x])) > > ms_wr.write(ms[i]) > > " > > > > This gives me 93 descriptors in total. > > > > A brief look and count in the Python API > > http://www.rdkit.org/docs/api/rdkit.Chem.Descriptors-module.html > > ends up in more than 170 descriptors. > > > > Another brief look (no time to grasp in more depth) reveals that apparently > > the fr_* descriptors have not been calculated. > > > > What did I do wrong? > > I don't see anything obvious, but you are definitely getting > incorrect results. > Here's what I see: > > In [16]: from rdkit import Chem > In [17]: from rdkit.ML.Descriptors import MoleculeDescriptors > In [18]: from rdkit.Chem import Descriptors > In [19]: len(Descriptors._descList) > Out[19]: 177 > In [20]: calc = > MoleculeDescriptors.MolecularDescriptorCalculator([x[0] for x in > Descriptors._descList]) > In [21]: len(calc.GetDescriptorNames()) > Out[21]: 177 > In [22]: m = Chem.MolFromSmiles('c1ccccc1OC') > In [23]: ds = calc.CalcDescriptors(m) > In [24]: len(ds) > Out[24]: 177 > > Just to eliminate some uncertainty, can you please try the above > commands and, if you don't see 177, add this: > > In [25]: from rdkit import rdBase > In [26]: rdBase.rdkitVersion > Out[26]: '2012.12.1pre' > > Thanks. > -greg Hi Greg, here we go: In [4]: from rdkit import Chem from rdkit import rdBase from rdkit.ML.Descriptors import MoleculeDescriptors from rdkit.Chem import Descriptors len(Descriptors._descList) Out[4]: 93 In [6]: calc = MoleculeDescriptors.MolecularDescriptorCalculator([x[0] for x in Descriptors._descList]) len (calc.GetDescriptorNames()) Out[6]: 93 In [7]: m = Chem.MolFromSmiles('c1ccccc1OC') ds = calc.CalcDescriptors(m) len(ds) Out[7]: 93 In [8]: print rdBase.rdkitVersion 2012.09.1beta Now you see that we are still using the Q3 beta :) Only solution: Upgrading to the stable version? Or is there a workaround available in conjunction with the Q3 beta? Cheers & Thanks, Paul This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer. |