Re: [Rdkit-devel] Two 'Labute' descriptors seem to be always returning 0.0
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Greg L. <gre...@gm...> - 2018-03-16 05:45:23
|
Hi Jeff, On Thu, Mar 15, 2018 at 10:26 PM, jeff godden <jg...@gm...> wrote: > > > First and foremost thank you for your excellent offering of a singularly > useful set of molecular descriptors! > Thanks for the kind words. :-) > As one of the authors cited by Dr Labute (http://www.chemcomp.com/journ > al/vsadesc.htm) > , > i've appreciated RDKit's implementation of the "Labute descriptor" set. > So i've noticed that two particular descriptors in that set, specifically > SlogP_VSA9 and SMR_VSA8, appear to always return a value of 0.0 no matter > which of a > substantial > set of small molecules are tested via: > > from rdkit.Chem import Descriptors > Descriptors.SlogP_VSA9(molecule) > Descriptors.SMR_VSA8(molecule) > Let's look at the first one. SlogP_VSA9 is the sum of the VSA contributions of atoms whose contribution to SlogP falls into bin 9. In the case of the RDKit this is means an atomic SlogP contribution of between 0.3 and 0.4 (I believe that this is the same as the definition in the original paper. This RDKit bins are not always the same as the original publication, in this case they are). So we need to look for atom types that have an SlogP contribution between 0.3 and 0.4. Curiously, there don't seem to be any of these. Going back to the original Wildman and Crippen paper (the source of these values), there aren't any there either. This is "a bit" strange since the Labute article says that they picked the bins in order to have evenly distributed values. So I don't think you're doing anything wrong; there's an oddity in the definition of the bins and it looks like that descriptor is basically always going to be zero. Thanks for pointing it out. Since I have found the various "MOE like" descriptors to be quite useful in the past, it's worth considering doing another version of them where the bin definitions have been adjusted based on a larger set of molecules. Something to think about for a future version... Thanks for the great question, -greg > > All other descriptors i've tried seem to produce sensible results (with > those same molecules) > > Of course it's always possible that it's something i've mis-coded. I'm > attaching a simple python program which returns the anomalous descriptor > values for me. > > Thank you very much for all your efforts! > -- > jeff godden > > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-devel mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-devel > > |