#529 Configurable UniversalIsomorphismTester [part one]

Needs_Revision
closed
nobody
cdk-1.4.x (181)
master
5
2013-09-23
2012-07-21
John May
No

Apologies for the size of this one.

Whilst writing an anual project review I came across something I never really looked into fixing/implementing. Programming being the most fun way to procrastinate I knocked up a working version last week but have now redone the code to be easier to review.

In short I wanted to make the UniversalIsomorphismTester (UIT) configurale to be specific to certain user parameters for non-query atoms. You can do this currently with the IQueryAtoms and SMARTS but the default behavior of the UIT is to only match the symbols when not dealing with query atoms. As we're dealing mainly with metabolism we need to the test to be more specific (i.e. not consider mannose and glucose isomorphs). I quick hack is to create a method that converts the 'normal' atom containers to a query atom container that for example matches symbol and charge. This will work perfectly fine but I wanted to be able to configure the isomorphism tester without duplicating the molecules each time I wanted to query. You may also then need to convert your molecule back if you want to do the use the mappings you got from the matcher with the original molecule.

The principles used are similar to the query atom by having the matching decoupled from the graph isomorphism. Like the 'matches(IAtom)' method (IQueryAtom) we have a stateless IAtomMatcher with a 'matches(IAtom query, IAtom subject)'. You then provide the IAtomMatcher as a parameter when you invoke any of the isomorphism methods. If no matcher is provided a default matcher (for symbol) is used and provides the same functionality as the existing UIT. This means no changes to any existing usages is need and is patched against 1.4.x :-). As with SMARTS you can connect different matchers with conjunction (and) and disjunction (or) to make more complex matchers - although these are global rather then atom specific.

Small example for matching charge and symbol: https://gist.github.com/3156215

As with the query atoms the decoupling allows users to create customer matchers (e.g. a custom property): https://gist.github.com/3156221

This patch is just part one and represents the main changes in UIT and provides the basic Symbol matcher. I have the other matchers (and logical connectives) on another branch which I'll post at the end but it should make it easier to review by splitting the changes to existing code from the later patch which will include all new classes simply adding functionality.

Breakdown of this patches commits

  1. created the IAtomMatcher, AbstractAtomMatcher and SymbolMatcher (with SymbolMatcherTest) - all new classes.
    https://github.com/johnmay/cdk/commit/45e979c3739867b79383bd1bdcef058806f3ee30

  2. small commit showing the main changes to core of UIT algorithm showing where the matchers are plugged in (node/arc construction)
    https://github.com/johnmay/cdk/commit/aec3a541e5e7e7a47bc006d167d641e9c2d91693

  3. API changes to UIT providing alternative methods for all matching method, single atom cases and Javadoc. This one is quite large as it effectively duplicates
    every method to be configurable.
    https://github.com/johnmay/cdk/commit/649e949f6c1466601b29fef947e075ce2c7188be

Notes:
1. I also found an unknown bug with the single atom cases where it was matching the g1 against the g1 and would always return true as it was a self match (see comment on third commit).
2. If any symbols were null you'd get a NPE when using the isomorphism tester which doesn't happen with the SymbolMatcher.
3. Ran full test-all pre/post and the only differences were the variables ones (e.g. timeouts/out of heap) i'll attach the summaries.
4. There's already an AtomMatcher interface for SMSD but it's similar to the query atoms that it has a state of a 'query' and couldn't be used.
5. It should be possible to integrate the same matchers into SMSD also but I'll leave that for now.

The Patch Branch.
https://github.com/johnmay/cdk/commits/feature/configurable-uit

Prototype branch with logical connectives and more matchers (not for this patch).
https://github.com/johnmay/cdk/commits/feature/variable-isomorphism-prototype

Discussion

<< < 1 2 (Page 2 of 2)
  • John May
    John May
    2012-07-29

    To minimise code duplication and make it easier to write custom matchers. The abstract matcher provides a null safe exact match of the attributes and the only part that changes in the match is the actually attribute.

    Consider: https://gist.github.com/3201564
    Compare to: https://gist.github.com/3156221

    However maybe the name should be different (e.g. ExactAttributeMatcher) as you would absolutely want a custom match implementation for special cases (e.g. pseudo atom symbols).

     
  • John, I have a week off this week, so that explains why I could work on the CDK so much this weekend :)

    But as I have a lot to do at home, so not sure when I get around to looking at this in enough detail... so maybe this is a stupid point... but I like your input on IQueryAtom and IQueryBond, which had this matching too...

    So, rather than "configuring" the search query, your patch "configures" the search engine... I like that idea, and removes the symmetry break down in the UIT...

    Thus, should we remove IQueryAtom and IQueryBond in master then, in favor of your approach?

     
  • Actually... I just realize that this new framework cannot replace the SMARTS query atoms...

     
  • John May
    John May
    2012-07-30

    Yep the query atoms/bonds are still needed for per atom queries. I guess it could replace the QueryAtomContainerCreator or at least some of it's methods. The main purpose was that I didn't want to build per atom queries for globally specific matching (e.g. engine configuration).

    I've also got a week (and a bit) off. I'll be mainly be looking at some OpenGL books but I should be able to get some of junior tasks crossed off. If there's anything specific that needs attention let me know.

     
  • John May
    John May
    2012-10-28

    • Milestone: Accepted --> Needs_Review
     
  • John May
    John May
    2012-11-14

    • branch: --> master
    • milestone: Needs_Review --> Needs_Revision
     
  • John May
    John May
    2012-11-14

    Need to patch for master

     
  • John May
    John May
    2013-02-08

    Update on this, I have written a new one in VF2 which I may patch for instead. The VF2 in SMSD is for subgraph isomorphism but reading the original article it's actually slightly different for isomorphism (you can prune more of the search tree).

     
  • John May
    John May
    2013-09-23

    • status: open --> closed
     
  • John May
    John May
    2013-09-23

    Closed - don't think this functionality is worth it for the UIT. It really just delays the issue of defining a good API for structure matching.

     
<< < 1 2 (Page 2 of 2)