#634 Ring Perception Pt 2

Accepted
closed
nobody
master
1
2013-06-10
2013-05-11
John May
No

Implementation of Vismara's 97 algorithm to find the relevant cycles in a graph. This is the union of all minimum cycle bases and is the smallest unique set of the cycles in a graph. The algorithm is much faster then the existing (SSSRFinder.findRelevantRings) and is intended to fit together with the new RingSearch. I also added a MinimumCycleBasis (SSSRFinder.findSSRR) which is also faster but not as much. Both of these use an intermediate (InitialCycles, describes in Vismara 97) which is quick to compute and can be used by both the RelevantRings and MinimumCycleBasis. Unfortunately one of the slowest steps is still getting from the IAtomContainer to the graph and back.

Code is on this branch (bd2af30083 onwards): https://github.com/johnmay/cdk/tree/ncycles

I wasn't sure which packages to put it. There's lots of package-private class as these are more tools to build other algorithms with then a tool in their own right. It needs the ShortestPaths so I put it in graph and module core, but could go elsewhere.

I've verified the output of RelevantRings against the existing implementation on ~46,000,000 structures from PubChem-Compound and found no mistakes. Unfortunately the output is close to a GB but I've attached the first ~2,000,000, not that it shows anything useful. The MCB is harder to verify as it is not unique and I could only verify the size of the MCB rather then actual paths.

Will also likely add the essential cycles and all cycles (rings) so all the ring code fits well together.

1 Attachments

Discussion

  • John May
    John May
    2013-05-13

    Just to add to this, the MCB is optional here. The relevant+essential cycles is definitely good but MCB is asymptotically worse. In practise it is actually faster for a set of molecules but with ridiculous graphs (e.g. Fulleren720) it is slower.

    Best to probably be adaptive and switch depending on the size of the input. As this implementation uses composition it should be easy to do and will see if using the alternate minimisation method from Ulrich's implementation gives comparable speeds.

     
    Last edit: John May 2013-05-13
  • John May
    John May
    2013-05-13

    Nvm made it faster :-).... haha

     
  • John May
    John May
    2013-05-14

    Added EssentialCycles to the branch and tweaked some other methods/doc.

     
    • Group: Needs_Review --> Needs_Revision
     
  • John May
    John May
    2013-06-08

    • Group: Needs_Revision --> Needs_Review
     
  • John May
    John May
    2013-06-08

    Okay corrected all the comments with an additional commit (except the guava one but that can wait). Also, we got told off the other month but all code we write has to have EMBL-EBI copyright. It's complex but in this case the reasoning is the code is visually part of the work going on at EBI - and they want to be seen to be supporting projects.

    I'm not going to change existing classes but have added it here and will do for other bits in future.

     
  • OK, thanx. Signing off and pushing.

     
    • status: open --> closed
    • Group: Needs_Review --> Accepted
     
  • Applied and pushed.