From: Egon W. <ego...@gm...> - 2010-02-25 09:07:49
|
Hi all, On Thu, Feb 25, 2010 at 9:11 AM, <ma...@eb...> wrote: >> Is it the SMARTS parsing that is the bottleneck? Or the actual >> isomorphism step? if the latter then switch from the UIT to one Syed's >> isomorphism implementations would likely speed things up > > It's been a little while since I last looked at it. The UIT plays a part > in the performance, and VF2 should speed things up. Other than that, for > matching against a relatively large and complex compound, the ring and > aromaticity perception done for such a target can also be quite expensive, > considering it's done run-time as part of a (database) query. Syed's code is getting closer to being ready for review, but there is still quite some to do... Syed and I have been getting the code 'CDK stable' and this patch on top of CDK master is available from: http://github.com/asad/cdk-smsd The Nightly running for it can be found at: http://pele.farmbio.uu.se/nightly-smsd/ JUnit ------- Testsuite: org.openscience.cdk.modulesuites.MsmsdTests Tests run: 55, Failures: 3, Errors: 5, Time elapsed: 3.617 sec and a lot of missing unit tests... JavaDoc ............ Summary: ERROR: 473 WARNING: 1 MINOR_ERROR: 62 PMD ------- Syed has fixed most of that... the remaining things require more non-trivial refactoring... Files Total Priority 1 Priority 2 Priority 3 Priority 4 Priority 5 8 39 0 0 39 0 0 For the rest, there is still some application code here and there, at least one outstanding patch ticket in SF... Summarizing, there is some work left to do, but it would be great to see this merged into CDK master, and all you help (small or big) would be very much appreciated. Please send you patches to Asad. Egon -- Post-doc @ Uppsala University Proteochemometrics / Bioclipse Group of Prof. Jarl Wikberg Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |