#646 ESSSR - Extended Smallest Set of Smallest Rings

Accepted
closed
nobody
master
1
2013-09-18
2013-06-11
John May
No

An algorithm for computing the extended smallest set of smallest rings (ESSSR). This is used by PubChem in their fingerprints. Currently we're using the SSSR which isn't correct. Now this ring set depends on canonical labelling (JavaDoc explained this in detail). I also added an option to turn this off which makes more sense to me and in my opinion gives a better fingerprint (examples in JavaDoc). Actually read the JavaDoc before looking at the code as it will make more sense :-).

Testing on PubChem Compound CID 1 - CID 25000 by reversing the fingerprint to get the ring sizes. Numbers are bits set.

   MCB/SSSR [37037/44660] 82.93 %
   ESSSR    [44603/44660] 99.87 %

We don't get 100% for a couple of reasons.
1. PubChem does not export complex bonds to their SDF format - so molecules like CID 21081 are read to acyclic (which in theory they are).
2. There are also a handfull of cases where it gets it wrong due to the labelling. This can be fixed by changing the order of the vertices but to be 100% would need the PubChem canonical labelling. Technically you get this in the download but doesn't always seem to be the case - maybe it changed. Contacting Evan about that.

Last two commits on this branch: johnmay/cdk/esssr.

Algorithm was actually really simple - EsssrCycles.java

J

Related

Patches: #646

Discussion

<< < 1 2 (Page 2 of 2)
  • John May
    John May
    2013-09-18

    Fixed up now - once applied we can fix that PubChemFingerprint bug.

     
  • Where 'now' refers to the Sept 5 commits?

     
  • I just tried to rebase your esssr branch on master, but get conflicts... I'm a bit distracted with too many things... please assist getting me the right patches apply...

     
  • John May
    John May
    2013-09-18

    Yeah there was some problems… waiting for Wolf to respond I made some changes to the other cycles code.

    Rebased here: https://github.com/johnmay/cdk/compare/feature;triple-cycles

    On 18 Sep 2013, at 13:36, Egon Willighagen egonw@users.sf.net wrote:

    Where 'now' refers to the Sept 5 commits?

    [patches:#646] ESSSR - Extended Smallest Set of Smallest Rings

    Status: open
    Labels: ring esssr
    Created: Tue Jun 11, 2013 05:29 PM UTC by John May
    Last Updated: Wed Sep 18, 2013 10:06 AM UTC
    Owner: nobody

    An algorithm for computing the extended smallest set of smallest rings (ESSSR). This is used by PubChem in their fingerprints. Currently we're using the SSSR which isn't correct. Now this ring set depends on canonical labelling (JavaDoc explained this in detail). I also added an option to turn this off which makes more sense to me and in my opinion gives a better fingerprint (examples in JavaDoc). Actually read the JavaDoc before looking at the code as it will make more sense :-).

    Testing on PubChem Compound CID 1 - CID 25000 by reversing the fingerprint to get the ring sizes. Numbers are bits set.

    MCB/SSSR [37037/44660] 82.93 %
    ESSSR [44603/44660] 99.87 %
    We don't get 100% for a couple of reasons.
    1. PubChem does not export complex bonds to their SDF format - so molecules like CID 21081 are read to acyclic (which in theory they are).
    2. There are also a handfull of cases where it gets it wrong due to the labelling. This can be fixed by changing the order of the vertices but to be 100% would need the PubChem canonical labelling. Technically you get this in the download but doesn't always seem to be the case - maybe it changed. Contacting Evan about that.

    Last two commits on this branch: johnmay/cdk/esssr.

    Algorithm was actually really simple - EsssrCycles.java

    J

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/patches/646/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

     

    Related

    Patches: #646

  • OK, so four patches. They look good. Should I apply them now, or wait?

     
  • John May
    John May
    2013-09-18

    Yep okay to apply. All that's left is a utility API for the ring sets but that can be separate patch - will then use that for results for the paper :-).

    On 18 Sep 2013, at 13:57, Egon Willighagen egonw@users.sf.net wrote:

    OK, so four patches. They look good. Should I apply them now, or wait?

    [patches:#646] ESSSR - Extended Smallest Set of Smallest Rings

    Status: open
    Labels: ring esssr
    Created: Tue Jun 11, 2013 05:29 PM UTC by John May
    Last Updated: Wed Sep 18, 2013 12:39 PM UTC
    Owner: nobody

    An algorithm for computing the extended smallest set of smallest rings (ESSSR). This is used by PubChem in their fingerprints. Currently we're using the SSSR which isn't correct. Now this ring set depends on canonical labelling (JavaDoc explained this in detail). I also added an option to turn this off which makes more sense to me and in my opinion gives a better fingerprint (examples in JavaDoc). Actually read the JavaDoc before looking at the code as it will make more sense :-).

    Testing on PubChem Compound CID 1 - CID 25000 by reversing the fingerprint to get the ring sizes. Numbers are bits set.

    MCB/SSSR [37037/44660] 82.93 %
    ESSSR [44603/44660] 99.87 %
    We don't get 100% for a couple of reasons.
    1. PubChem does not export complex bonds to their SDF format - so molecules like CID 21081 are read to acyclic (which in theory they are).
    2. There are also a handfull of cases where it gets it wrong due to the labelling. This can be fixed by changing the order of the vertices but to be 100% would need the PubChem canonical labelling. Technically you get this in the download but doesn't always seem to be the case - maybe it changed. Contacting Evan about that.

    Last two commits on this branch: johnmay/cdk/esssr.

    Algorithm was actually really simple - EsssrCycles.java

    J

    Sent from sourceforge.net because cdk-patches@lists.sourceforge.net is subscribed to https://sourceforge.net/p/cdk/patches/

    To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cdk/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


    LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
    1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
    2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
    Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
    http://pubads.g.doubleclick.net/gampad/clk?id=58041151&iu=/4140/ostg.clktrk_______
    Cdk-patches mailing list
    Cdk-patches@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/cdk-patches

     

    Related

    Patches: #646

    • status: open --> closed
    • Group: Needs_Review --> Accepted
     
  • OK, applied and pushed.

     
<< < 1 2 (Page 2 of 2)