From: Nina J. <jel...@gm...> - 2012-03-30 06:45:48
|
Hello All, We've found a case where the UniversalIsomorphismTester gives unexpected results - in the example it considers C**C as a subgraph of SCCS. The test below is intentionally stripped out of any SMARTS related code. Is this UIT behaviour known? public void testUIT() throws Exception { //Testing CDK isomorphism of C**C against SCCS --> it gives wrong result //Setting search query 'C**C' QueryAtomContainer q = new QueryAtomContainer(); //setting atoms IQueryAtom a0 = new AliphaticSymbolAtom("C"); q.addAtom(a0); IQueryAtom a1 = new AnyAtom(); q.addAtom(a1); IQueryAtom a2 = new AnyAtom(); q.addAtom(a2); IQueryAtom a3 = new AliphaticSymbolAtom("C"); q.addAtom(a3); //setting bonds OrderQueryBond b0 = new OrderQueryBond(IBond.Order.SINGLE); b0.setAtoms(new IAtom[] {a0,a1}); q.addBond(b0); OrderQueryBond b1 = new OrderQueryBond(IBond.Order.SINGLE); b1.setAtoms(new IAtom[] {a1,a2}); q.addBond(b1); OrderQueryBond b2 = new OrderQueryBond(IBond.Order.SINGLE); b2.setAtoms(new IAtom[] {a2,a3}); q.addBond(b2); //Creating 'SCCS' target molecule AtomContainer target = new AtomContainer(); //atoms IAtom ta0 = new Atom("S"); target.addAtom(ta0); IAtom ta1 = new Atom("C"); target.addAtom(ta1); IAtom ta2 = new Atom("C"); target.addAtom(ta2); IAtom ta3 = new Atom("S"); target.addAtom(ta3); //bonds IBond tb0 = new Bond(); tb0.setAtoms(new IAtom[] {ta0,ta1}); tb0.setOrder(IBond.Order.SINGLE); target.addBond(tb0); IBond tb1 = new Bond(); tb1.setAtoms(new IAtom[] {ta1,ta2}); tb1.setOrder(IBond.Order.SINGLE); target.addBond(tb1); IBond tb2 = new Bond(); tb2.setAtoms(new IAtom[] {ta2,ta3}); tb2.setOrder(IBond.Order.SINGLE); target.addBond(tb2); //Isomorphism check boolean res = UniversalIsomorphismTester.isSubgraph(target, q); System.out.println("Mapping C**C against SCCS = " + res); } >Mapping C**C against SCCS = true Best regards, Nina |
From: Rajarshi G. <raj...@gm...> - 2012-03-30 18:53:54
|
Interesting bug, I haven't seen this before. Every bond in the query can match every bond in the target, in terms of order and atoms. I assume the problem is in traversal of the RGraph, but haven't been able to work out the problem On Fri, Mar 30, 2012 at 2:45 AM, Nina Jeliazkova <jel...@gm...> wrote: > Hello All, > > We've found a case where the UniversalIsomorphismTester gives unexpected > results - in the example it considers C**C as a subgraph of SCCS. > The test below is intentionally stripped out of any SMARTS related code. > > Is this UIT behaviour known? > > > public void testUIT() throws Exception > { > //Testing CDK isomorphism of C**C against SCCS --> it gives wrong result > //Setting search query 'C**C' > QueryAtomContainer q = new QueryAtomContainer(); > //setting atoms > IQueryAtom a0 = new AliphaticSymbolAtom("C"); > q.addAtom(a0); > IQueryAtom a1 = new AnyAtom(); > q.addAtom(a1); > IQueryAtom a2 = new AnyAtom(); > q.addAtom(a2); > IQueryAtom a3 = new AliphaticSymbolAtom("C"); > q.addAtom(a3); > //setting bonds > OrderQueryBond b0 = new OrderQueryBond(IBond.Order.SINGLE); > b0.setAtoms(new IAtom[] {a0,a1}); > q.addBond(b0); > OrderQueryBond b1 = new OrderQueryBond(IBond.Order.SINGLE); > b1.setAtoms(new IAtom[] {a1,a2}); > q.addBond(b1); > OrderQueryBond b2 = new OrderQueryBond(IBond.Order.SINGLE); > b2.setAtoms(new IAtom[] {a2,a3}); > q.addBond(b2); > //Creating 'SCCS' target molecule > AtomContainer target = new AtomContainer(); > //atoms > IAtom ta0 = new Atom("S"); > target.addAtom(ta0); > IAtom ta1 = new Atom("C"); > target.addAtom(ta1); > IAtom ta2 = new Atom("C"); > target.addAtom(ta2); > IAtom ta3 = new Atom("S"); > target.addAtom(ta3); > //bonds > IBond tb0 = new Bond(); > tb0.setAtoms(new IAtom[] {ta0,ta1}); > tb0.setOrder(IBond.Order.SINGLE); > target.addBond(tb0); > IBond tb1 = new Bond(); > tb1.setAtoms(new IAtom[] {ta1,ta2}); > tb1.setOrder(IBond.Order.SINGLE); > target.addBond(tb1); > IBond tb2 = new Bond(); > tb2.setAtoms(new IAtom[] {ta2,ta3}); > tb2.setOrder(IBond.Order.SINGLE); > target.addBond(tb2); > //Isomorphism check > boolean res = UniversalIsomorphismTester.isSubgraph(target, q); > System.out.println("Mapping C**C against SCCS = " + res); > } > >>Mapping C**C against SCCS = true > > Best regards, > Nina > > ------------------------------------------------------------------------------ > This SF email is sponsosred by: > Try Windows Azure free for 90 days Click Here > http://p.sf.net/sfu/sfd2d-msazure > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > -- Rajarshi Guha | http://blog.rguha.net NIH Center for Advancing Translational Science |
From: Egon W. <ego...@gm...> - 2012-04-07 13:44:01
|
On Fri, Mar 30, 2012 at 8:53 PM, Rajarshi Guha <raj...@gm...> wrote: > Interesting bug, I haven't seen this before. Every bond in the query > can match every bond in the target, in terms of order and atoms. I > assume the problem is in traversal of the RGraph, but haven't been > able to work out the problem Is this the same problem intrinsic to the algorithm that uses bond matching then? The same that the algorithm cannot distinguish isobutane from cyclopropane? Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Rajarshi G. <raj...@gm...> - 2012-04-07 13:52:59
|
On Sat, Apr 7, 2012 at 9:43 AM, Egon Willighagen <ego...@gm...> wrote: > On Fri, Mar 30, 2012 at 8:53 PM, Rajarshi Guha <raj...@gm...> wrote: >> Interesting bug, I haven't seen this before. Every bond in the query >> can match every bond in the target, in terms of order and atoms. I >> assume the problem is in traversal of the RGraph, but haven't been >> able to work out the problem > > Is this the same problem intrinsic to the algorithm that uses bond > matching then? The same that the algorithm cannot distinguish > isobutane from cyclopropane? That's what I thought initially. But at one point the atoms in the bonds are checked during traversal of the RGraph and so should work correctly. In this case, it might be a unique edge case due to the ** component of the query. -- Rajarshi Guha | http://blog.rguha.net NIH Center for Advancing Translational Science |
From: Egon W. <ego...@gm...> - 2012-04-07 14:10:42
|
On Sat, Apr 7, 2012 at 3:52 PM, Rajarshi Guha <raj...@gm...> wrote: > That's what I thought initially. But at one point the atoms in the > bonds are checked during traversal of the RGraph and so should work > correctly. In this case, it might be a unique edge case due to the ** > component of the query. It would indeed be interesting to dump the actual bond matches... Egon -- Dr E.L. Willighagen Postdoctoral Researcher Department of Bioinformatics - BiGCaT Maastricht University (http://www.bigcat.unimaas.nl/) Homepage: http://egonw.github.com/ LinkedIn: http://se.linkedin.com/in/egonw Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers |
From: Nina J. <jel...@gm...> - 2012-04-08 04:48:53
|
On 7 April 2012 16:52, Rajarshi Guha <raj...@gm...> wrote: > On Sat, Apr 7, 2012 at 9:43 AM, Egon Willighagen > <ego...@gm...> wrote: > > On Fri, Mar 30, 2012 at 8:53 PM, Rajarshi Guha <raj...@gm...> > wrote: > >> Interesting bug, I haven't seen this before. Every bond in the query > >> can match every bond in the target, in terms of order and atoms. I > >> assume the problem is in traversal of the RGraph, but haven't been > >> able to work out the problem > > > > Is this the same problem intrinsic to the algorithm that uses bond > > matching then? The same that the algorithm cannot distinguish > > isobutane from cyclopropane? > > That's what I thought initially. But at one point the atoms in the > bonds are checked during traversal of the RGraph and so should work > correctly. In this case, it might be a unique edge case due to the ** > component of the query. > A little bit of context may help. It happens not only with * . There are cases with "A" , and perhaps a case could be constructed where one atom can be matched by several OR-ed conditions instead of *. The original reason we've started to debug this issue was the query below was not working [1] The SMARTS [O,o,OH,N,n,$(P=O),$(C=S),$(S=O),$(C=O)]~[A,a]~[A,a]~[O,o,OH,N,n,$(P=O),$(C=S),$(S=O),$(C=O)] was matching [S-]C(=S)N , essentially finding path of length 4, where it should not. This happens when the target structure has explicit hydrogens, not otherwise. However, the problem is not because of the explicit H, but because of the topology, as illustrated in the test in my first mail. Best regards, Nina [1] http://sourceforge.net/tracker/?func=detail&aid=3472325&group_id=152702&atid=785126 > -- > Rajarshi Guha | http://blog.rguha.net > NIH Center for Advancing Translational Science > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Cdk-devel mailing list > Cdk...@li... > https://lists.sourceforge.net/lists/listinfo/cdk-devel > |