#529 Configurable UniversalIsomorphismTester [part one]

cdk-1.4.x (181)
John May

Apologies for the size of this one.

Whilst writing an anual project review I came across something I never really looked into fixing/implementing. Programming being the most fun way to procrastinate I knocked up a working version last week but have now redone the code to be easier to review.

In short I wanted to make the UniversalIsomorphismTester (UIT) configurale to be specific to certain user parameters for non-query atoms. You can do this currently with the IQueryAtoms and SMARTS but the default behavior of the UIT is to only match the symbols when not dealing with query atoms. As we're dealing mainly with metabolism we need to the test to be more specific (i.e. not consider mannose and glucose isomorphs). I quick hack is to create a method that converts the 'normal' atom containers to a query atom container that for example matches symbol and charge. This will work perfectly fine but I wanted to be able to configure the isomorphism tester without duplicating the molecules each time I wanted to query. You may also then need to convert your molecule back if you want to do the use the mappings you got from the matcher with the original molecule.

The principles used are similar to the query atom by having the matching decoupled from the graph isomorphism. Like the 'matches(IAtom)' method (IQueryAtom) we have a stateless IAtomMatcher with a 'matches(IAtom query, IAtom subject)'. You then provide the IAtomMatcher as a parameter when you invoke any of the isomorphism methods. If no matcher is provided a default matcher (for symbol) is used and provides the same functionality as the existing UIT. This means no changes to any existing usages is need and is patched against 1.4.x :-). As with SMARTS you can connect different matchers with conjunction (and) and disjunction (or) to make more complex matchers - although these are global rather then atom specific.

Small example for matching charge and symbol: https://gist.github.com/3156215

As with the query atoms the decoupling allows users to create customer matchers (e.g. a custom property): https://gist.github.com/3156221

This patch is just part one and represents the main changes in UIT and provides the basic Symbol matcher. I have the other matchers (and logical connectives) on another branch which I'll post at the end but it should make it easier to review by splitting the changes to existing code from the later patch which will include all new classes simply adding functionality.

Breakdown of this patches commits

  1. created the IAtomMatcher, AbstractAtomMatcher and SymbolMatcher (with SymbolMatcherTest) - all new classes.

  2. small commit showing the main changes to core of UIT algorithm showing where the matchers are plugged in (node/arc construction)

  3. API changes to UIT providing alternative methods for all matching method, single atom cases and Javadoc. This one is quite large as it effectively duplicates
    every method to be configurable.

1. I also found an unknown bug with the single atom cases where it was matching the g1 against the g1 and would always return true as it was a self match (see comment on third commit).
2. If any symbols were null you'd get a NPE when using the isomorphism tester which doesn't happen with the SymbolMatcher.
3. Ran full test-all pre/post and the only differences were the variables ones (e.g. timeouts/out of heap) i'll attach the summaries.
4. There's already an AtomMatcher interface for SMSD but it's similar to the query atoms that it has a state of a 'query' and couldn't be used.
5. It should be possible to integrate the same matchers into SMSD also but I'll leave that for now.

The Patch Branch.

Prototype branch with logical connectives and more matchers (not for this patch).


  • John May

    John May - 2012-07-21
  • John May

    John May - 2012-07-21
  • Egon Willighagen

    No review yet, but a quick note... looking at the description, I think this should go to master, not?

    Can you also make sure to file reports for bugs you find... fixes for that, preferably, do go into cdk-1.4.x, but should not depend on larger chances like this.

  • John May

    John May - 2012-07-29

    Yep it's possible to go in master but there is no change behaviour just new methods. My understanding was change in behavior/method deletions (i.e. stuff which could break users current code) was master.

    Will also add the other bug (Nina pointed out one with SMILES was a duplicate) .

  • Egon Willighagen

    Ah, then cdk-1.4.x is OK! It does not hurt to stress no API changes are made in the description :) So that it is clear to idiots like me too :)

  • John May

    John May - 2012-07-29

    Hehe, was already very long :-).

  • Rajarshi Guha

    Rajarshi Guha - 2012-07-29

    Great functinality. The patches look ok to me. One question:

    what is the need for the AbstractAtomMatcher? More specifically why is it useful to separate the getAttribute method from the matches method? Couldn't one simply implement different classes (each implementing IAtomMatcher) in which the match mehtod access the appropriate attribute?

  • John May

    John May - 2012-07-29

    To minimise code duplication and make it easier to write custom matchers. The abstract matcher provides a null safe exact match of the attributes and the only part that changes in the match is the actually attribute.

    Consider: https://gist.github.com/3201564
    Compare to: https://gist.github.com/3156221

    However maybe the name should be different (e.g. ExactAttributeMatcher) as you would absolutely want a custom match implementation for special cases (e.g. pseudo atom symbols).

  • Egon Willighagen

    John, I have a week off this week, so that explains why I could work on the CDK so much this weekend :)

    But as I have a lot to do at home, so not sure when I get around to looking at this in enough detail... so maybe this is a stupid point... but I like your input on IQueryAtom and IQueryBond, which had this matching too...

    So, rather than "configuring" the search query, your patch "configures" the search engine... I like that idea, and removes the symmetry break down in the UIT...

    Thus, should we remove IQueryAtom and IQueryBond in master then, in favor of your approach?

  • Egon Willighagen

    Actually... I just realize that this new framework cannot replace the SMARTS query atoms...

  • John May

    John May - 2012-07-30

    Yep the query atoms/bonds are still needed for per atom queries. I guess it could replace the QueryAtomContainerCreator or at least some of it's methods. The main purpose was that I didn't want to build per atom queries for globally specific matching (e.g. engine configuration).

    I've also got a week (and a bit) off. I'll be mainly be looking at some OpenGL books but I should be able to get some of junior tasks crossed off. If there's anything specific that needs attention let me know.

  • John May

    John May - 2012-10-28
    • Milestone: Accepted --> Needs_Review
  • John May

    John May - 2012-11-14
    • branch: --> master
    • milestone: Needs_Review --> Needs_Revision
  • John May

    John May - 2012-11-14

    Need to patch for master

  • John May

    John May - 2013-02-08

    Update on this, I have written a new one in VF2 which I may patch for instead. The VF2 in SMSD is for subgraph isomorphism but reading the original article it's actually slightly different for isomorphism (you can prune more of the search tree).

  • John May

    John May - 2013-09-23
    • status: open --> closed
  • John May

    John May - 2013-09-23

    Closed - don't think this functionality is worth it for the UIT. It really just delays the issue of defining a good API for structure matching.


Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks