Hi, I recent received a bug report for rcdk that reported the murcko fragmentation of http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16129878 was taking more than an hour.

I have a patch for master that reduces this to 88s - by keeping track of fragments generated and avoiding processing of identical fragments in recursive calls. I check for duplicates by generating the SMILES for the fragments (and storing them in a HashSet) - and this causes an expected slowdown.

My question is - si there a faster way to check for existence of a molecule that does not involve SMILES generation?

(Also, where can I send a patch for review - I'm a little out of date with protocols these days :)

Rajarshi Guha | http://blog.rguha.net
NIH Center for Advancing Translational Science