Re: [Rdkit-discuss] PMI API
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Guillaume G. <Gui...@fi...> - 2017-01-16 08:54:33
|
Dear Chris, No prob let me explain: I Aggree on monoatomics center of mass is the atom so (for all x axis: Ix= 0) Now I consider the mathematics only not the physics. I suggest that they (Todeschini) are not really computing the "real physical" PMi on the 3 axis but arbitrary said that for 2D molecules the 3nd axis PMi is zero. BR Dr. Guillaume GODIN Principal Scientist Chemoinformatic & Datamining Innovation CORPORATE R&D DIVISION DIRECT LINE +41 (0)22 780 3645 MOBILE +41 (0)79 536 1039 Firmenich SA RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 ________________________________ De : Chris Earnshaw <cge...@gm...> Envoyé : lundi 16 janvier 2017 09:36 À : Guillaume GODIN Cc : Greg Landrum; RDKit Discuss Objet : Re: [Rdkit-discuss] PMI API On 16 January 2017 at 06:25, Guillaume GODIN <Gui...@fi...<mailto:Gui...@fi...>> wrote: reading carefully the Todeschini article, them said that Ic,Ib,Ia are determine as max & min values of I other all 3D axis passing throught the center of mass! I don't quite understand this comment. The inequality Ia <= Ib <= Ic is one of the errors in the Todeschini article pointed out by Greg yesterday. By definition, the Principal Moment of Inertia axes pass through the centre of mass. The "global PM" is never zero (sum of mi*ri*ri) idem for PMi even for planar molecule. The global Moment of Inertia is only zero for monatomics. But When you have a planar molecule, the matrix is no more 3D but 2D! so it's normal to consider that the 3nd PM is zero. I really don't understand this - it's simply wrong. The molecule may be 2D but the three principal moments of inertia are most definitely non-zero for a planar structure. For a fully symmetrical molecule like benzene the largest PMI is around the axis perpendicular to the plane of the molecule and there are two equivalent, smaller, PMIs perpendicular to each other in the plane of the molecule. For a less symmetrical molecule like naphthalene, the largest PMI is again around the axis perpendicular to the plane, the intermediate PMI is along the fusion bond between the rings and the smallest PMI is around the long axis of the molecule. There's no way it can be correct to consider the 3rd PMI as zero in any planar molecule - it's never equal to zero and is only degenerate with the 2nd PMI for fully symmetric molecules. Only in the special case of a completely linear molecule (e.g. acetylene, HCN) is the 3rd PMI (along the axis of the molecule) equal to zero. Apologies - I appear to have opened a can of worms here... Chris ________________________________ De : Greg Landrum <gre...@gm...<mailto:gre...@gm...>> Envoyé : dimanche 15 janvier 2017 17:42 À : Guillaume GODIN; RDKit Discuss Objet : Re: [Rdkit-discuss] PMI API Thanks Guillaume! On Sun, Jan 15, 2017 at 5:01 PM, Guillaume GODIN <Gui...@fi...<mailto:Gui...@fi...>> wrote: Here, Dragon results for the 3 molecules: I've included both Whim and 3D descriptors but I don't have access to PMi! I found the second document in agreement with Peter answer... BR, Dr. Guillaume GODIN Principal Scientist Chemoinformatic & Datamining Innovation CORPORATE R&D DIVISION DIRECT LINE +41 (0)22 780 3645<tel:+41%2022%20780%2036%2045> MOBILE +41 (0)79 536 1039<tel:+41%2079%20536%2010%2039> Firmenich SA RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 ________________________________ De : Peter Gedeck <pet...@gm...<mailto:pet...@gm...>> Envoyé : dimanche 15 janvier 2017 15:07 À : Greg Landrum; RDKit Discuss; Guillaume GODIN Objet : Re: [Rdkit-discuss] PMI API According to this: https://en.wikipedia.org/wiki/List_of_moments_of_inertia The moments of inertia of a disk (something like benzene) are: Iz = mr^2/2 Ix = Iy = mr^2/4 None of them is zero. The smallest moment of inertia of a rod-like molecule (e.g. C#C) is zero. Best, Peter On Sun, Jan 15, 2017 at 8:15 AM Greg Landrum <gre...@gm...<mailto:gre...@gm...>> wrote: Hi Guillaume, I think it this case it's something else. According to the Todeschini article the smallest moment of inertia of a planar molecule like benzene should be zero. The eigenvalues of the inertia matrix for benzene, however, are definitely not zero (and not close enough that it's likely to be round-off error). It would be very nice if you could run the three files I mention through Dragon and let me know what it calculates for those descriptors. -greg _____________________________ From: Guillaume GODIN <gui...@fi...<mailto:gui...@fi...>> Sent: Sunday, January 15, 2017 1:11 PM Subject: RE: [Rdkit-discuss] PMI API To: Greg Landrum <gre...@gm...<mailto:gre...@gm...>>, RDKit Discuss <rdk...@li...<mailto:rdk...@li...>>, Chris Earnshaw <cge...@gm...<mailto:cge...@gm...>> Dear Greg, I suspect that it's a precision error or eigen algorithm shift between rdkit c++ & dragon. To obtain good value, I suggest to try to implement a test on the eigen values like i did in gateway.cpp implementation. JacobiSVD<MatrixXd> getSVD(MatrixXd A) { JacobiSVD<MatrixXd> mysvd(A, ComputeThinU | ComputeThinV); return mysvd; } // get the A-1 matrix using MatrixXd GetPinv(MatrixXd A){ JacobiSVD<MatrixXd> svd = getSVD(A); double pinvtoler=1.e-2;// choose your tolerance wisely! VectorXd vs=svd.singularValues(); VectorXd vsinv=svd.singularValues(); for (unsignedint i=0; i<A.cols(); ++i) { if ( vs(i) > pinvtoler ) vsinv(i)=1.0/vs(i); else vsinv(i)=0.0; } MatrixXd S = vsinv.asDiagonal(); MatrixXd Ap = svd.matrixV() * S * svd.matrixU().transpose(); return Ap; } If it's not solve the problem, I would like to test it in Matlab. can you provide me the 3 (3d xyz matrix) of your example please ? I also have Dragon 6 best regards, Dr. Guillaume GODIN Principal Scientist Chemoinformatic & Datamining Innovation CORPORATE R&D DIVISION DIRECT LINE +41 (0)22 780 3645<tel:022%20780%2036%2045> MOBILE +41 (0)79 536 1039<tel:079%20536%2010%2039> Firmenich SA RUE DES JEUNES 1 | CASE POSTALE 239 | CH-1211 GENEVE 8 ________________________________ De : Greg Landrum <gre...@gm...<mailto:gre...@gm...>> Envoyé : dimanche 15 janvier 2017 11:50 À : Chris Earnshaw; RDKit Discuss Objet : Re: [Rdkit-discuss] PMI API I managed to make some time to look into this this weekend and I've found a bug and something I don't understand. Hopefully the community can help out here. On Sun, Jan 8, 2017 at 11:17 AM, Chris Earnshaw <cge...@gm...<mailto:cge...@gm...>> wrote: 4) The big one! The returned results look very odd. They appear to relate more to the dimensions of the molecule than the moments of inertia. For a rod-like molecule (dimethylacetylene) I'd expect two large and one small PMI (e.g. PMI1: 6.61651 PMI2: 150.434 PMI3: 150.434 NPR1: 0.0439828 NPR2: 0.999998) but actually get PMI1: 0.061647 PMI2: 0.061652 PMI3: 25.3699 NPR1: 0.002430 NPR2: 0.002430. For disk-like (benzene) the result should be one large and two medium (e.g. PMI1: 89.1448 PMI2: 89.1495 PMI3: 178.294 NPR1: 0.499987 NPR2: 0.500013) but get PMI1: 2.37457e-10 PMI2: 11.0844 PMI3: 11.0851 NPR1: 2.14213e-11 NPR2: 0.999933. Finally for a roughly spherical molecule (neopentane) the NPR values look reasonable (no great surprise) but the absolute PMI values may be too small: old program - PMI1: 114.795 PMI2: 114.797 PMI3: 114.799 NPR1: 0.999966 NPR2: 0.999988, new program - PMI1: 6.59466 PMI2: 6.59488 PMI3: 6.59531 NPR1: 0.999902 NPR2: 0.999935 Your expectations are correct: the current RDKit implementation is wrong. The corresponding github entry is here: https://github.com/rdkit/rdkit/issues/1262 This is due to a mistake in the way the principal moments are calculated (which is due to the fact that I don't spend a lot of time working with/thinking about 3D descriptors). Instead of using the eigenvectors/eigenvalues of the inertia matrix (the tensor of inertia) the RDKit is currently using the covariance matrix. There's some more on the relationship between these two here: http://number-none.com/blow/inertia/deriving_i.html The problem is easy to fix (and I have something working here: https://github.com/greglandrum/rdkit/tree/fix/github1262), but it screws up the values of the descriptors that are derived from here: Todeschini and Consoni "Descriptors from Molecular Geometry" Handbook of Chemoinformaticshttp://dx.doi.org/10.1002/9783527618279.ch37 These include the radius of gyration, inertial shape factor, etc. Within that article they state that Ic = 0 for planar molecules. Ignoring the inequality on page 1010, which says that Ic is the largest moment and is contradicted by the rest of the text (particularly the inequalities on page 1011), Ic corresponds to the smallest principal moment : PMI1. So now I'm confused, but I'm hoping this is obvious to someone versed in the field: I'd like to reproduce the descriptors described in the Todeschini article, but I clearly can't do that using the actual moments of inertia. I could keep using the eigenvalues of the covariance matrix there, but that doesn't match what's described in the text. Two things that would be extremely helpful: 1) an explanation of the disconnect here from someone who knows this stuff, I would guess that it's pretty simple 2) The results of running the files github1262_1.mol, github1262_2.mol, and github1262_3.mol from here: https://github.com/greglandrum/rdkit/tree/fix/github1262/Code/GraphMol/MolTransforms/test_data through Dragon and calculating the radius of gyration, inertial shape factor, eccentricity, molecular asphericity, and spherocity index. Best, -greg ********************************************************************** DISCLAIMER This email and any files transmitted with it, including replies and forwarded copies (which may contain alterations) subsequently transmitted from Firmenich, are confidential and solely for the use of the intended recipient. The contents do not represent the opinion of Firmenich except to the extent that it relates to their official business. ********************************************************************** ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi_______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss ------------------------------------------------------------------------------ Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi _______________________________________________ Rdkit-discuss mailing list Rdk...@li...<mailto:Rdk...@li...> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss |