Apologies for the size of this one.
Whilst writing an anual project review I came across something I never really looked into fixing/implementing. Programming being the most fun way to procrastinate I knocked up a working version last week but have now redone the code to be easier to review.
In short I wanted to make the UniversalIsomorphismTester (UIT) configurale to be specific to certain user parameters for non-query atoms. You can do this currently with the IQueryAtoms and SMARTS but the default behavior of the UIT is to only match the symbols when not dealing with query atoms. As we're dealing mainly with metabolism we need to the test to be more specific (i.e. not consider mannose and glucose isomorphs). I quick hack is to create a method that converts the 'normal' atom containers to a query atom container that for example matches symbol and charge. This will work perfectly fine but I wanted to be able to configure the isomorphism tester without duplicating the molecules each time I wanted to query. You may also then need to convert your molecule back if you want to do the use the mappings you got from the matcher with the original molecule.
The principles used are similar to the query atom by having the matching decoupled from the graph isomorphism. Like the 'matches(IAtom)' method (IQueryAtom) we have a stateless IAtomMatcher with a 'matches(IAtom query, IAtom subject)'. You then provide the IAtomMatcher as a parameter when you invoke any of the isomorphism methods. If no matcher is provided a default matcher (for symbol) is used and provides the same functionality as the existing UIT. This means no changes to any existing usages is need and is patched against 1.4.x :-). As with SMARTS you can connect different matchers with conjunction (and) and disjunction (or) to make more complex matchers - although these are global rather then atom specific.
Small example for matching charge and symbol: https://gist.github.com/3156215
As with the query atoms the decoupling allows users to create customer matchers (e.g. a custom property): https://gist.github.com/3156221
This patch is just part one and represents the main changes in UIT and provides the basic Symbol matcher. I have the other matchers (and logical connectives) on another branch which I'll post at the end but it should make it easier to review by splitting the changes to existing code from the later patch which will include all new classes simply adding functionality.
Breakdown of this patches commits
created the IAtomMatcher, AbstractAtomMatcher and SymbolMatcher (with SymbolMatcherTest) - all new classes.
small commit showing the main changes to core of UIT algorithm showing where the matchers are plugged in (node/arc construction)
API changes to UIT providing alternative methods for all matching method, single atom cases and Javadoc. This one is quite large as it effectively duplicates
every method to be configurable.
1. I also found an unknown bug with the single atom cases where it was matching the g1 against the g1 and would always return true as it was a self match (see comment on third commit).
2. If any symbols were null you'd get a NPE when using the isomorphism tester which doesn't happen with the SymbolMatcher.
3. Ran full test-all pre/post and the only differences were the variables ones (e.g. timeouts/out of heap) i'll attach the summaries.
4. There's already an AtomMatcher interface for SMSD but it's similar to the query atoms that it has a state of a 'query' and couldn't be used.
5. It should be possible to integrate the same matchers into SMSD also but I'll leave that for now.
The Patch Branch.
Prototype branch with logical connectives and more matchers (not for this patch).