[Cdk-devel] Re: [Cdk-user] Possible solution to graph matching problem? From: Egon Willighagen - 2005-01-27 10:24 ```Irilenia/All, =A0 On Wednesday 26 January 2005 12:20, Irilenia Nobeli wrote: > I think I may have found the cause of the graph matching problem I > mentioned recently. > The getOverlaps() method of UniversalIsomorphismTester works as follows: > a)It does a search for all possible mappings > b)It projects the solutions on the first molecule > c)It reduces the solutions by checking for possible subgraphs or > isomorphic graphs. > > The problem is I think in part (c). The series of calls to methods is > as follows: > > getOverlaps() -> getMaximum() -> isIsomorph() -> getIsomorphMap() > ->search() > > I think the problem is the way the search method is called: > List rMapsList =3D search(g1, g2, getBitSet(g1), > getBitSet(g1), false, false); > (line 140 in UniversalIsomorphismTester.java) > > Shouldn't that be: > List rMapsList =3D search(g1, g2, getBitSet(g1), > getBitSet(g2), false, false); > > so that all bonds in BOTH graphs must be matched? Isn't this the > definition of isomorphic graphs? > > Hope this is of some help towards solving the problem for the next > release. =A0 about your possible solution, I've another occurence of this way of=20 calling the search() method (rev 1.29):=20 =2D on line 140 is the line you mentioned=20 =2D on line 172 is another call like thus=20 =A0 But, the JavaDoc of the search method says:=20 =A0 /**=20 =A0 =A0* =A0General Rgraph parsing method (usually not used directly)=20 =A0 =A0* =A0This method is the entry point for the recursive search=20 =A0 =A0* =A0adapted to the atom container input.=20 =A0 =A0*=20 =A0 =A0* @param =A0g1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0first molecule=20 =A0 =A0* @param =A0g2 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0second molecule=20 =A0 =A0* @param =A0c1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0initial condition ( bo= nds from g1 that=20 =A0 =A0* =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 must be contai= ns in the solution )=20 =A0 =A0* @param =A0c2 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0initial condition ( bo= nds from g2 that=20 =A0 =A0* =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 must be contai= ns in the solution )=20 =A0 =A0* @param =A0findAllStructure =A0if false stop at the first structure= found=20 =A0 =A0* @param =A0findAllMap =A0 =A0 =A0 =A0if true search all the 'mappin= gs' for one=20 same=20 =A0 =A0* =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 structure=20 =A0 =A0* @return =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 a list of rMapList tha= t represent the search=20 solutions=20 =A0 =A0*/=20 =A0 public static List search(AtomContainer g1, AtomContainer g2, BitSet=20 c1,=20 =A0 =A0 =A0 BitSet c2, boolean findAllStructure, boolean findAllMap) {=20 =A0 =2E.. suggesting that you might be right.=20 =A0 So, question: is the code broken at two places, or is the JavaDoc=20 not clear on the subject?=20 =A0 Modifying the two methods given above does not break any JUnit test=20 but does not fix this bug either...=20 =A0 See:=20 http://sourceforge.net/tracker/index.php?func=3Ddetail&aid=3D1110537&group_= id=3D20024&atid=3D120024 thoughts?=20 =A0 Egon=20 > > Date: Tue, 25 Jan 2005 18:05:40 +0000 (GMT) > > From: Irilenia Nobeli > > To: cdk-user@... > > Subject: [Cdk-user] Graph matching problem > > > > Hi, > > > > I have noticed that there are cases where the graph matching algorithm > > seems to return different results for a pair of molecules depending on > > the order these molecules are passed onto the method getOverlaps of the > > class UniversalIsomorphismTester. I don't mean that the mappings are > > different (which they should be as they are projections always on the > > first molecule) but that the maximum common structure returned has a > > different size. I may be wrong, but this looks like a bug. > > > > For example, take the two molecules attached as MDL files. If you run > > UniversalIsomorphismTester.getOverlaps(mol1, mol2) on these two, in > > one case > > you get a clique of 2 atoms, in the other a clique of 11 atoms. These t= wo > > molecules both have hydrogens in the original file. If the hydrogens are > > removed, you get mappings with the same number of atoms, regardless of > > which molecule is first and which is second, i.e. the problem is not > > there anymore (actually there could be other examples where even after > > removal of hydrogens the problem remains - I just haven't come across > > them yet). > > > > Well, if anyone has any ideas why this happens, I could do with some > > help. > > > > Thanks, > > > > Irilenia =2D-=20 egonw@... GPG: 1024D/D6336BA6 ```