Yes, the hash code should help with this.

Also, not exactly clear what Murcko provides but to get the ring systems you can use, This will give you biconnected components in linear time and separate isolated cycles from the fused systems.


On 19 Jun 2013, at 15:23, Rajarshi Guha <> wrote:

Hi, I recent received a bug report for rcdk that reported the murcko fragmentation of was taking more than an hour.

I have a patch for master that reduces this to 88s - by keeping track of fragments generated and avoiding processing of identical fragments in recursive calls. I check for duplicates by generating the SMILES for the fragments (and storing them in a HashSet) - and this causes an expected slowdown.

My question is - si there a faster way to check for existence of a molecule that does not involve SMILES generation?

(Also, where can I send a patch for review - I'm a little out of date with protocols these days :)

Rajarshi Guha |
NIH Center for Advancing Translational Science
This email is sponsored by Windows:

Build for Windows Store.
Cdk-devel mailing list