From: Egon W. <ego...@gm...> - 2012-05-05 11:47:18
|
Hi all, On Fri, May 4, 2012 at 1:44 PM, Egon Willighagen <ego...@gm...> wrote: > I'm moving forward with this, and write a patch against CDK 1.4.x, > because we'll need it for Bioclipse 2.6 (though Bioclipse can always > 'fork' in case in mainstream CDK we decide to put it only in > master)... if you have comments, suggestions, this would be a good > time. A set of seven patches is available here: https://github.com/egonw/cdk/commits/396-14x-moreExplicitAtomTypeInfo The following patch introduces a bit more convenience method: https://github.com/egonw/cdk/commit/deed1201cc2a62784a9446f4c10b7f73dc39be33 These patches introduce the new more explicit atom type information: https://github.com/egonw/cdk/commit/b2c052680f4ea88bc61989070d12190cbbca648a https://github.com/egonw/cdk/commit/affcc7abd3111a24e3cb73c4bd7bb1db45f21243 With further, new testing of the new functionality (again, unless I made transcription errors, the old functionality (neighbor count and pi bond count) is unaltered; this was already tested in depth and showed no problems): https://github.com/egonw/cdk/commit/28bcb6ea1f0ac756e4248eefb6f7fe114f78e6c1 https://github.com/egonw/cdk/commit/a46bf290be6e07e033dd72172bd60170471ec34b One important change that can cause downstream problems, is the split of one atom type (it causes no regression in the CDK library itself, but may cause trouble if your code relies strongly on the C.sp atom type for =C= like structures (C.allene is now introduced for this atom type and was previously perceived as C.sp): https://github.com/egonw/cdk/commit/0208a93530d89bec7bbd6733bed69e858870aba8 The only regression that occured are in a number of descriptors, that did atom type perception, causing unit tests to fail that test if the input structure for which the descriptor was calculated was changed. Of course it was, as the CDK atom types now also set the IAtomType max bond order and bond order sum fields. So, this patch updates the affected descriptors to restore the original values for those fields: https://github.com/egonw/cdk/commit/affcc7abd3111a24e3cb73c4bd7bb1db45f21243 These patches are now up for review for inclusion in the CDK 1.4.x series. Your careful analysis is very much appreciated! Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |