Yes, the hash code should help with this.

Also, not exactly clear what Murcko provides but to get the ring systems you can use, http://pele.farmbio.uu.se/nightly/api/org/openscience/cdk/ringsearch/RingSearch.html. This will give you biconnected components in linear time and separate isolated cycles from the fused systems.

J

On 19 Jun 2013, at 15:23, Rajarshi Guha <rajarshi.guha@gmail.com> wrote:

Hi, I recent received a bug report for rcdk that reported the murcko fragmentation of http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=16129878 was taking more than an hour.

I have a patch for master that reduces this to 88s - by keeping track of fragments generated and avoiding processing of identical fragments in recursive calls. I check for duplicates by generating the SMILES for the fragments (and storing them in a HashSet) - and this causes an expected slowdown.

My question is - si there a faster way to check for existence of a molecule that does not involve SMILES generation?

(Also, where can I send a patch for review - I'm a little out of date with protocols these days :)

--
Rajarshi Guha | http://blog.rguha.net
NIH Center for Advancing Translational Science
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev_______________________________________________
Cdk-devel mailing list
Cdk-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/cdk-devel