From: <chr...@ba...> - 2004-01-12 16:03:46
|
Hi all, I have a question regarding the descriptors LogP and MolarRefractivity: If I calculate these descriptors for the molecule Oc1ccccc1OC (which is also used in the class GroupContributionTest) using a 3D-structure (see file "tst.sdf" attached at the end of the mail) with the command: convert.sh +d tst.sdf I get the following wrong results: > <MolarRefractivity> 4.3751E1 > <LogP> 2.2822 But if I delete all H-Atoms by using: convert.sh -h +d tst.sdf I get the following results, which are correct according to the information in GroupContributionTest: > <MolarRefractivity> 3.46588E1 > <LogP> 1.4007999999999998 My question is: Why do I get the wrong results, if I do not delete the H-Atoms? Do I am on the save side, if I always delete H-Atoms when using a group contribution method?? (For the "PolarSurfaceArea" it seems, that it doesn't matter if I use "-h" or not.....) (By the way, I've used version 2003-08-04) All the best and thank you very much in advance, Christoph Niederalt _________________________________________ Bayer Technology Services GmbH PT-AS-CS Leverkusen, K 9 Tel.: +49 (0)214 30 75414 Fax: +49 (0)214 30 64801 E-Mail: chr...@ba... Internet : http://www.bayertechnology.com tst.sdf: ===snip=== Model1 Cerius2 01120415373D 1 1.00000 Structure written by MSI Cerius2 SD Exporter 17 17 0 0 0 0 0 0 0 0999 V2000 1.0524 -1.7053 -0.0293 C 0 0 0 0 0 0 2.1287 -1.6085 -0.0417 H 0 0 0 0 0 0 0.2023 -0.5743 0.0099 C 0 0 0 0 0 0 0.6723 0.7567 0.0358 O 0 0 0 0 0 0 -1.1895 -0.7709 0.0242 C 0 0 0 0 0 0 -2.0332 0.3430 0.0629 O 0 0 0 0 0 0 -1.7189 -2.0726 -0.0001 C 0 0 0 0 0 0 -2.7901 -2.2293 0.0108 H 0 0 0 0 0 0 -0.8683 -3.1781 -0.0387 C 0 0 0 0 0 0 -1.2814 -4.1784 -0.0573 H 0 0 0 0 0 0 0.5140 -2.9945 -0.0532 C 0 0 0 0 0 0 1.1714 -3.8540 -0.0831 H 0 0 0 0 0 0 2.0823 0.9796 0.0217 C 0 0 0 0 0 0 -3.0188 0.1062 0.0707 H 0 0 0 0 0 0 2.2712 2.0723 0.0457 H 0 0 0 0 0 0 2.5572 0.5287 0.9194 H 0 0 0 0 0 0 2.5320 0.5731 -0.9095 H 0 0 0 0 0 0 1 2 1 0 0 0 1 3 1 0 0 0 1 11 2 0 0 0 3 4 1 0 0 0 3 5 2 0 0 0 4 13 1 0 0 0 5 6 1 0 0 0 5 7 1 0 0 0 6 14 1 0 0 0 7 8 1 0 0 0 7 9 2 0 0 0 9 10 1 0 0 0 9 11 1 0 0 0 11 12 1 0 0 0 13 15 1 0 0 0 13 16 1 0 0 0 13 17 1 0 0 0 M END $$$$ |
From: Joerg K. W. <we...@in...> - 2004-01-12 16:33:47
|
Hi Christoph, wrong ... mmhh ... depends on the standpoint of the observer. You surely know, and this is still an 'small' actual problem for JOELib and OpenBabel and a 'big' problem for all NOT-OPENSOURCE programs. Which means: The descriptor calculation process depends on four different expert systems (aromaticity, hybridization, implicite valence and finally atom type). See: http://www-ra.informatik.uni-tuebingen.de/software/joelib/tutorial/atomtyper.html and actual discussions: http://sourceforge.net/mailarchive/message.php?msg_id=6905792 http://sourceforge.net/mailarchive/forum.php?thread_id=3732947&forum_id=3042 Mostly all ALL descriptors depends on the assigned atom type, so the result for each descriptors will change for each other program, because most of them have their own atomTyper. JOELib and OpenBabel uses the same and are OpenSource, but there is still room for improvement to be more general. So the group contribution definitions are joelib\src\joelib\data\plain\LogP.contributions joelib\src\joelib\data\plain\MR.contributions joelib\src\joelib\data\plain\PSA.contributions If you use hydrogens from other programs, they will be used to assign the SMART patterns, if you remove the hydrogens, JOELib/OpenBabel will calculate the implicite hydrogen count on its own, which is, taking my descriptor calculation experience into account, really good. For the definition of the SMARTS group parts see original literature reference for these definitions or ask Stephen Jelfs (Gillet/Willet group) for more details, because he has implemented this algorithm. s....@sh... Furthermore, you surely know that the LogP is not really good, i've once ago published a paper about LogP/LogS prediction and this LogP was one of the worst, although my models were overfitted (can be seen in actual accepted papers, available in 2-3 weeks). See publication section: http://www-ra.informatik.uni-tuebingen.de/mitarb/wegner So yes, i recommend to remove all hydrogens, until no general public definition FOR ALL PROGRAMS is available ... see mailing list discussion above. Often people believe that descriptors are program independent. Especially taking into account that descriptors depends on a complex atom assignment process this IS JUST WRONG !!! I complain this in my two actual papers and this is actually one of the big problems. So i believe, that nearly all descriptor calculation programs produce different results (most use their own atom typer), even if the descriptor calculation algorithm is exactly the same, which is also not always true. Regards, Joerg > I have a question regarding the descriptors LogP and MolarRefractivity: > > If I calculate these descriptors for the molecule Oc1ccccc1OC (which is also > used > in the class GroupContributionTest) using a 3D-structure (see file "tst.sdf" > attached at the end of the mail) > with the command: > > convert.sh +d tst.sdf > > I get the following wrong results: > > >> <MolarRefractivity> > > 4.3751E1 > >> <LogP> > > 2.2822 > > > But if I delete all H-Atoms by using: > convert.sh -h +d tst.sdf > > I get the following results, > which are correct according to the information in GroupContributionTest: > > >> <MolarRefractivity> > > 3.46588E1 > >> <LogP> > > 1.4007999999999998 > > My question is: Why do I get the wrong results, if I do not delete the H-Atoms? > Do I am on the save side, if I always delete H-Atoms when using a > group contribution method?? > (For the "PolarSurfaceArea" it seems, that it doesn't matter if I use "-h" or > not.....) > > (By the way, I've used version 2003-08-04) > > All the best and thank you very much in advance, > Christoph Niederalt > _________________________________________ > Bayer Technology Services GmbH > PT-AS-CS > Leverkusen, K 9 > Tel.: +49 (0)214 30 75414 > Fax: +49 (0)214 30 64801 > E-Mail: chr...@ba... > Internet : http://www.bayertechnology.com > > tst.sdf: > ===snip=== > Model1 > Cerius2 01120415373D 1 1.00000 > Structure written by MSI Cerius2 SD Exporter > 17 17 0 0 0 0 0 0 0 0999 V2000 > 1.0524 -1.7053 -0.0293 C 0 0 0 0 0 0 > 2.1287 -1.6085 -0.0417 H 0 0 0 0 0 0 > 0.2023 -0.5743 0.0099 C 0 0 0 0 0 0 > 0.6723 0.7567 0.0358 O 0 0 0 0 0 0 > -1.1895 -0.7709 0.0242 C 0 0 0 0 0 0 > -2.0332 0.3430 0.0629 O 0 0 0 0 0 0 > -1.7189 -2.0726 -0.0001 C 0 0 0 0 0 0 > -2.7901 -2.2293 0.0108 H 0 0 0 0 0 0 > -0.8683 -3.1781 -0.0387 C 0 0 0 0 0 0 > -1.2814 -4.1784 -0.0573 H 0 0 0 0 0 0 > 0.5140 -2.9945 -0.0532 C 0 0 0 0 0 0 > 1.1714 -3.8540 -0.0831 H 0 0 0 0 0 0 > 2.0823 0.9796 0.0217 C 0 0 0 0 0 0 > -3.0188 0.1062 0.0707 H 0 0 0 0 0 0 > 2.2712 2.0723 0.0457 H 0 0 0 0 0 0 > 2.5572 0.5287 0.9194 H 0 0 0 0 0 0 > 2.5320 0.5731 -0.9095 H 0 0 0 0 0 0 > 1 2 1 0 0 0 > 1 3 1 0 0 0 > 1 11 2 0 0 0 > 3 4 1 0 0 0 > 3 5 2 0 0 0 > 4 13 1 0 0 0 > 5 6 1 0 0 0 > 5 7 1 0 0 0 > 6 14 1 0 0 0 > 7 8 1 0 0 0 > 7 9 2 0 0 0 > 9 10 1 0 0 0 > 9 11 1 0 0 0 > 11 12 1 0 0 0 > 13 15 1 0 0 0 > 13 16 1 0 0 0 > 13 17 1 0 0 0 > M END > $$$$ > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Perforce Software. > Perforce is the Fast Software Configuration Management System offering > advanced branching capabilities and atomic changes on 50+ platforms. > Free Eval! http://www.perforce.com/perforce/loadprog.html > _______________________________________________ > Joelib-devel mailing list > Joe...@li... > https://lists.sourceforge.net/lists/listinfo/joelib-devel > -- Dipl. Chem. Joerg K. Wegner Center of Bioinformatics Tuebingen (ZBIT) Department of Computer Architecture Univ. Tuebingen, Sand 1, D-72076 Tuebingen, Germany Phone: (+49/0) 7071 29 78970 Fax: (+49/0) 7071 29 5091 E-Mail: mailto:we...@in... WWW: http://www-ra.informatik.uni-tuebingen.de -- Never mistake motion for action. E. Hemingway |