Hi All,

As some of you may know I've been working on improving (and extending) the ring detection in CDK. One core principle is to have optimised thread-safe(r) algorithms to detect cycles on the graph rather then the 'AtomContainer' molecule. We will then have have adapters which handle the RingSets whilst still providing efficient building blocks. As an example in SMARTS, we don't actually need the rings objects, just the counts, size and memberships. I'll blog about this when it's complete but I have some decisions on the API and would like some opinions.

Path Storage

The paths are currently stored as fixed size array of vertices (int[]). Traditionally cycles are stored with the start and end vertex the same giving a closed walk or we could not keep the repeated vertex (and have just a path).

The advantage of the path is the length of the array is the length of the cycle and we can easily transform it to lowest lexicographic rank (equality operations). The edge iteration of a cyclic path is a little more complicated (see below).
The advantage of the closed-walk is the edge iteration is a little simple and the data stored explicitly defines the path forms a cycle. The other operations of course are not difficult but a little more code.

// 6 member cycles (i.e. benzene)
int[] c1 = new int[]{0, 1, 2, 3, 4}; // path
int[] c2 = new int[]{0, 1, 2, 3, 4, 1}; // closed-walk
 
// all edges of c1
for (int i = 0; i < c1.length; i++) {
int u = c1[i];
int v = c1[(i + 1) % c1.length];
}
 
// all edges of c2 (closed-walk)
for (int i = 1; i < c2.length; i++) {
int u = c2[i - 1];
int v = c2[i];
}

Some algorithms give you one, whilst others give you the other. Which do you think we should return? 

Timeout Exception

There are currently two implementations of Hanser's path-graph reduction to find all cycles. These are AllRingsFinder (cdk-standard) and HanserRingFinder (cdk-smsd). The AllRingsFinder has a timeout and throws an exception if the limit (default=5 s) is reached, HanserRingFinder will just keep running. I've managed to tweak the data structures so that now anything longer then 2 ms probably it's worth bothering (i.e. generates too many to be useful). Now the question is, should we really throw an exception? It's not really a timeout, rather YourRingSystemIsTooBigException it just happens to be currently done by timing. 

Checked exceptions are good sometimes but can be overused. In the case of the AllRingsFinder, if the exception is thrown there is no way to get what was done. To me, I think a cleaner
way could be to indicate whether the algorithm actually finished. Most will probably ignore it, so maybe it's not a good idea but the code is much simpler.

Thoughts? 

AllCycles ac = ...;
 
// optionally check if it actually finished
if(!ac.finished())
throw MyOwnException();

// even if the algorithm didn't complete we have some cycles
for (int[] p : ac.paths()) {
}



Many thanks,
John