From: SourceForge.net <no...@so...> - 2012-07-29 18:35:21
|
Patches item #3546768, was opened at 2012-07-21 10:58 Message generated for change (Comment added) made by rajarshi You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3546768&group_id=20024 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: cdk-1.4.x Group: Needs Review Status: Open Resolution: None Priority: 5 Private: No Submitted By: John May (jwmay) Assigned to: Nobody/Anonymous (nobody) Summary: Configurable UniversalIsomorphismTester [part one] Initial Comment: Apologies for the size of this one. Whilst writing an anual project review I came across something I never really looked into fixing/implementing. Programming being the most fun way to procrastinate I knocked up a working version last week but have now redone the code to be easier to review. In short I wanted to make the UniversalIsomorphismTester (UIT) configurale to be specific to certain user parameters for non-query atoms. You can do this currently with the IQueryAtoms and SMARTS but the default behavior of the UIT is to only match the symbols when not dealing with query atoms. As we're dealing mainly with metabolism we need to the test to be more specific (i.e. not consider mannose and glucose isomorphs). I quick hack is to create a method that converts the 'normal' atom containers to a query atom container that for example matches symbol and charge. This will work perfectly fine but I wanted to be able to configure the isomorphism tester without duplicating the molecules each time I wanted to query. You may also then need to convert your molecule back if you want to do the use the mappings you got from the matcher with the original molecule. The principles used are similar to the query atom by having the matching decoupled from the graph isomorphism. Like the 'matches(IAtom)' method (IQueryAtom) we have a stateless IAtomMatcher with a 'matches(IAtom query, IAtom subject)'. You then provide the IAtomMatcher as a parameter when you invoke any of the isomorphism methods. If no matcher is provided a default matcher (for symbol) is used and provides the same functionality as the existing UIT. This means no changes to any existing usages is need and is patched against 1.4.x :-). As with SMARTS you can connect different matchers with conjunction (and) and disjunction (or) to make more complex matchers - although these are global rather then atom specific. Small example for matching charge and symbol: https://gist.github.com/3156215 As with the query atoms the decoupling allows users to create customer matchers (e.g. a custom property): https://gist.github.com/3156221 This patch is just part one and represents the main changes in UIT and provides the basic Symbol matcher. I have the other matchers (and logical connectives) on another branch which I'll post at the end but it should make it easier to review by splitting the changes to existing code from the later patch which will include all new classes simply adding functionality. Breakdown of this patches commits 1. created the IAtomMatcher, AbstractAtomMatcher and SymbolMatcher (with SymbolMatcherTest) - all new classes. https://github.com/johnmay/cdk/commit/45e979c3739867b79383bd1bdcef058806f3ee30 2. small commit showing the main changes to core of UIT algorithm showing where the matchers are plugged in (node/arc construction) https://github.com/johnmay/cdk/commit/aec3a541e5e7e7a47bc006d167d641e9c2d91693 3. API changes to UIT providing alternative methods for all matching method, single atom cases and Javadoc. This one is quite large as it effectively duplicates every method to be configurable. https://github.com/johnmay/cdk/commit/649e949f6c1466601b29fef947e075ce2c7188be Notes: 1. I also found an unknown bug with the single atom cases where it was matching the g1 against the g1 and would always return true as it was a self match (see comment on third commit). 2. If any symbols were null you'd get a NPE when using the isomorphism tester which doesn't happen with the SymbolMatcher. 3. Ran full test-all pre/post and the only differences were the variables ones (e.g. timeouts/out of heap) i'll attach the summaries. 4. There's already an AtomMatcher interface for SMSD but it's similar to the query atoms that it has a state of a 'query' and couldn't be used. 5. It should be possible to integrate the same matchers into SMSD also but I'll leave that for now. The Patch Branch. https://github.com/johnmay/cdk/commits/feature/configurable-uit Prototype branch with logical connectives and more matchers (not for this patch). https://github.com/johnmay/cdk/commits/feature/variable-isomorphism-prototype ---------------------------------------------------------------------- >Comment By: Rajarshi Guha (rajarshi) Date: 2012-07-29 11:35 Message: Great functinality. The patches look ok to me. One question: what is the need for the AbstractAtomMatcher? More specifically why is it useful to separate the getAttribute method from the matches method? Couldn't one simply implement different classes (each implementing IAtomMatcher) in which the match mehtod access the appropriate attribute? ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-07-29 11:00 Message: Hehe, was already very long :-). ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2012-07-29 10:38 Message: Ah, then cdk-1.4.x is OK! It does not hurt to stress no API changes are made in the description :) So that it is clear to idiots like me too :) ---------------------------------------------------------------------- Comment By: John May (jwmay) Date: 2012-07-29 10:31 Message: Yep it's possible to go in master but there is no change behaviour just new methods. My understanding was change in behavior/method deletions (i.e. stuff which could break users current code) was master. Will also add the other bug (Nina pointed out one with SMILES was a duplicate) . ---------------------------------------------------------------------- Comment By: Egon Willighagen (egonw) Date: 2012-07-29 06:23 Message: No review yet, but a quick note... looking at the description, I think this should go to master, not? Can you also make sure to file reports for bugs you find... fixes for that, preferably, do go into cdk-1.4.x, but should not depend on larger chances like this. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=320024&aid=3546768&group_id=20024 |