Menu

#25 VFLib integration

Accepted
closed
None
5
2012-10-28
2008-12-02
No

The new VFLib isomorphism algorithms for unique matching are ready to go into the CDK.

Attachment contains ./main and ./test folders with Java source files. Two algorithms have been implemented, Ullman and VF2. The JUnit tests are done for both.

After some consideration I've put the classes in the same package as the UIT.

Discussion

  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-02

    Java source files

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-02

    Detail:

    org/openscience/cdk/isomorphism/UniversalIsomorphismTester.java is lingering around in the tar file, but can be removed. I just had it there to copy JUnit tests over.

    Mark

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-02

    I took a quick look at the code. It seems that the entry point for a user would be VF2SubgraphIsomorphism - the user must create an instance of this class and then pass it to SubgraphIsomorphism.

    I think it might be better to have SubgraphIsomorphism as the entry point and let it create the actual isomorrphism class based on a parameter. In the long run, one could switch between UIT and VF2 via the SubgraphIsomorphism class.

    Also, the VF2 class says that the second arg must be a IQueryAtomContainer - but UIT allows the second arg to be IAtomContainer - it would be useful to update the VF2 code (and basd on the above, the SubgraphIsomorphism code as well) to let the second arg to the constructor be an IAtomContainer

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-02

    I made a new version (see attachment) with the SubgraphIsomorphism now as entry point and the IQueryAtomContainer moved out of sight.
    The SubgraphIsomorphism's constructor now creates the QueryAtomContainerCreator.

    There's only one issue with that - the JUnit test testAnyAtomAnyBondCase() now fails, because of the constructor now running createBasicQueryContainer.

    I'm not sure how to fix that quickly. Rajarshi, did you mean get rid of the QueryAtomContainer alltogether? Please update the code as you think appropriate, probably the easiest way forward.

    Mark

    File Added: vflib.2.tar.gz

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-02

    Version 2, some fixes made

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-02

    No,not get rid of the IQueryAtomContainer - for example the SMARTS search will create a query atom container and pass that to the UIT. But in many scenarios we just pas 2 IAtomContainers to check for isomorphism. I don't immediately recall how the UIT handles this, but I don't think it's a big issue. The main reason I raised this is that if the entry point requires a IQueryAtomContainr as input the user must create one by hand, which makes for more complexity.

    I will be able to look at the code in more detail after the 9th

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-02

    latest patch for VFLib SS

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-02

    I went ahead and modified SubgraphIsomorphism to check whether the query was an IQueryAtomContainer or not. Also I think the arguments to VF2 and Ullman were switched, so I think it's OK now

     
  • Egon Willighagen

    I am going to set up a branch for this, so that I can run Nightly on it.

     
  • Egon Willighagen

    OK, a branch is now available from:

    http://pele.farmbio.uu.se/nightly-vflib/

    Right now, compile fails... getting back on the gzip error... I think it failed to copy in Rajarshi's new UniversalIsomorphismTester... that would need to be manually added again...

    I also already spotted missing license/copyright statements in the new files...

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-05

    Regarding copyright and license info, see:
    http://amalfi.dis.unina.it/graph/db/vflib-2.0/doc/vflib-2.html
    VFlib states "The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software, together with the associated disclaimers"

    So perhaps the bit below should be added to the new isomorphism classes. After all, they are pretty much a verbatim translation of the C++ classes, so in effect it is a copy to Java.

    Copyright (c) 2001 by
    Dipartimento di Informatica e Sistemistica
    Università degli studi di Napoli ``Federico II''
    http://amalfi.dis.unina.it
    

    Mark

     
  • Egon Willighagen

    Mark, can you please add those copyright notices to the new files in the branch?

    Please make the header look like:

    / $Revision$ $Author$ $Date$

    * Copyright (C) 2001 Dipartimento di Informatica e Sistemistica, Università degli studi di Napoli ``Federico II' http://amalfi.dis.unina.it
    * 2008 Mark Rynbeek mark_rynbeek@users.sf.net
    *
    * Contact: cdk-devel@lists.sourceforge.net
    *
    * Permission is hereby granted, free of charge, to any person obtaining a copy of this software and
    * associated documentation files (the "Software"), to deal in the Software without restriction, including
    * without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
    * of the Software, and to permit persons to whom the Software is furnished to do so, subject to the
    * following conditions:

    * 1. The above copyright notice and this permission notice shall be included in all copies or substantial
    * portions of the Software, together with the associated disclaimers.
    * 2. Any modification to the standard distribution of the Software shall be mentioned in a prominent notice
    * in the documentation provided with the modified distribution, stating clearly how, when and by
    * whom the Software has been modified.
    * 3. Either the modified distribution shall contain the entire source code of the standard distribution of the
    * Software, or the documentation shall provide instructions on where the source code of the standard
    * distribution of the Software can be obtained.
    *
    * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
    * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
    * PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE
    * FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
    * OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
    * DEALINGS IN THE SOFTWARE.
    /

    And add this to the class JavaDoc:

    @cdk.license MIT-like

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-11

    I've taken a look at the code and made a number of edits. The main chnage is that both args to SubgraphIsomorphism are now IAtomContainer. This means that when VF2 or Ullman looks for a bond match we determine if the bond i an IQueryBond - if so, use matches(). If not then the query was a plain IAtomContainer and then we match a bond based on bond order and equality of atom symbols of the two bonds. I think this is the correct behavior. I also updated tests to JUnit 4 and everything passes.

    At this point the code is good to go, pending three possibly minor issues

    1. do we want to keep everything under isomorphism? (I think so)

    2. Currently SubgraphIsomorphism takes the query as first arg and target as second arg. This is opposite to UIT - it might be useful to chnage the code to swap the order of the args, just to make closer to the UIT

    3. Right now if one wants to do a subgr isomorpism, you have to create an instance of the class for each query-target combo. It might be useeful to have a getter and setter for the query and target molecules and associated reset functions for the VF2 and Ullman implementation. This would allow one to create a single instance of the SubgraphIsomorphism class and use it for different combos of target and query. Could lead to improved performance when repeatedly doing matches (such as in the substructure fp)

    Also, it'd be useful to have more tests, but that can be added later on on a continuous basis

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-11

    Rajarshi,

    If I'm not mistaken aromaticity is not taken into consideration in the Ullmann and VF2 classes?
    Which means you'd possibly get too few results if you just test for bond order.
    UIT does test for it ( ac2.getBond(j).getFlag(CDKConstants.ISAROMATIC) etc)

    I suppose we just need to add a few lines.

    Mark

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-11

    Mark, thanks for the pointer. I updated State to include aromaticity as a condition for bond equality and everything seems to work fine. So everything is good. What do you hink about the last two points in my previous comment?

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-11

    I just added a test to check whether CCC matches in C1CC1 - both VF2 and Ullman fail this test

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-11

    Too hasty - test failure was due to typo

     
  • Mark Rijnbeek

    Mark Rijnbeek - 2008-12-12

    Yep I agree on your last two points

     
  • Egon Willighagen

    I have updated the Nightly view on the branch at pele:

    http://pele.farmbio.uu.se/nightly-vflib/

     
  • Rajarshi Guha

    Rajarshi Guha - 2008-12-13

    I have gone ahead and changed the signature of the SubgraphIsomorphism constructor to match UIT. I have also updated the code to allow one to start with en 'empty' class - no target or query molecules. Also provided getter/setter for target and query, so that you test multiple queries against a single target or vice versa.

    At this point I think it's good to go

     
  • Egon Willighagen

    Available in the git world on http://pele.farmbio.uu.se/git/cdk.git/ in the 22-vflib branch.

     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
  • Egon Willighagen

    Reopened... was just automatically and incorrectly closed, due to set as pending, after getting no reply.

    Status: this patch is a direct translation, and a wish was expressed to rewrite things based on MX.

    Meanwhile, a EBI scientist outside Chris' group has done so.

     
  • Egon Willighagen

    The other EBI scientist involved, Syed, will work with Chris on getting his patch in. This patch will not be used.

     

Log in to post a comment.