An algorithm for computing the extended smallest set of smallest rings (ESSSR). This is used by PubChem in their fingerprints. Currently we're using the SSSR which isn't correct. Now this ring set depends on canonical labelling (JavaDoc explained this in detail). I also added an option to turn this off which makes more sense to me and in my opinion gives a better fingerprint (examples in JavaDoc). Actually read the JavaDoc before looking at the code as it will make more sense :-).
Testing on PubChem Compound CID 1 - CID 25000 by reversing the fingerprint to get the ring sizes. Numbers are bits set.
MCB/SSSR [37037/44660] 82.93 % ESSSR [44603/44660] 99.87 %
We don't get 100% for a couple of reasons.
1. PubChem does not export complex bonds to their SDF format - so molecules like CID 21081 are read to acyclic (which in theory they are).
2. There are also a handfull of cases where it gets it wrong due to the labelling. This can be fixed by changing the order of the vertices but to be 100% would need the PubChem canonical labelling. Technically you get this in the download but doesn't always seem to be the case - maybe it changed. Contacting Evan about that.
Last two commits on this branch: johnmay/cdk/esssr.
Algorithm was actually really simple - EsssrCycles.java