#582 [bug:1198] mdl valency unification

Accepted
closed
nobody
master
1
2013-09-17
2012-11-17
John May
No

This patch unifies behaviour as per [bug:1198].

The bug incorrectly reported that the MDLReader ignores valency information for non-pseudo atoms. This was incorrect and the valency information was only read for non-pseudo atoms. To avoid the potential confusion the conditional for the pseudo-atom was removed - it should not matter if the atom is a pseudo atom we can still set the valency. The MDLV2000Writer however did ignore the valency unless an IO option was set. Reading the CTFile spec shows the valency field is not for query molecules. The IO option was deprecated and valency is now always written.

As Rich describes in the blog post the valence field shows the number of bonds including implicit hydrogens. I've added the setting of implicit hydrogens once the MDL file is read.

I put it on master as I deprecated an IO option on the MDLWriter - plus the implicit hydrogen inference is a bit more then a bug fix. Plus I think I have a couple of MDL patches waiting on 1.4 which could cause merge conflicts.

https://github.com/johnmay/cdk/tree/feature/mdl-valence

Related

Patches: #582

Discussion

  • John May
    John May
    2013-02-03

    Needs another pair of eyes, I've reverted the commits for now but this would definitely be a positive for library.

     
  • John May
    John May
    2013-02-03

    • status: open --> pending
    • milestone: Needs_Review --> Needs_Revision
     
  • John May
    John May
    2013-03-01

    Relevant post from Roger Sayle - Explicit and implicit hydrogens taking liberties with valence

    Actually the CDK atom typer doesn't do too bad on the benchmark... (95.11% recall 95.11% precision) but of course closer to 100% would be great and if this information can be loaded/saved in the connection table all the better

     
  • Oh, very interesting! Yeah, 95% is not that good... on PubChem we did better... what did you test it on? I like to see it...

     
  • John May
    John May
    2013-08-30

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -6,5 +6,4 @@
    
     I put it on master as I deprecated an IO option on the MDLWriter - plus the implicit hydrogen inference is a bit more then a bug fix. Plus I think I have a couple of MDL patches waiting on 1.4 which could cause merge conflicts.
    
    -https://github.com/johnmay/cdk/tree/master+
    -
    +https://github.com/johnmay/cdk/tree/feature/mdl-valence
    
    • status: pending --> open
    • Group: Needs_Revision --> Needs_Review
     
  • John May
    John May
    2013-09-05

    I should add this does cause some regressions in 'cdk-smiles' and 'cdk-fingerprints'. The SMILES one is something really odd 'C(C=)' (WTF) but not too bothered by that as these errors are fixed the new SMILES code. The fingerprint issues are deeper. They're all to do with SMARTS matching - it seems now some patterns will 'correctly' match as the hydrogen counts add up. This however contradicts the test assertion that one molecule is a substructure of another. Looking at the fails I thought perhaps the MACCS SMARTS were wrong but they do seem to be correct. Anyways bit hard without drawing it out.

    J

    On 30 Aug 2013, at 20:49, John May jwmay@users.sf.net wrote:

    Corrected this now thanks to Roger's blog post. To add some extra links here's the switch statement tabulated: http://www.ics.uci.edu/~dock/manuals/oechem/cplusprog/mdlvalence.html

    [patches:#582] [bug:1198] mdl valency unification

    Status: open
    Labels: io mdl valence implicit hydrogens bug
    Created: Sat Nov 17, 2012 03:06 PM UTC by John May
    Last Updated: Sun Mar 03, 2013 09:31 PM UTC
    Owner: nobody

    This patch unifies behaviour as per [bug:1198].

    The bug incorrectly reported that the MDLReader ignores valency information for non-pseudo atoms. This was incorrect and the valency information was only read for non-pseudo atoms. To avoid the potential confusion the conditional for the pseudo-atom was removed - it should not matter if the atom is a pseudo atom we can still set the valency. The MDLV2000Writer however did ignore the valency unless an IO option was set. Reading the CTFile spec shows the valency field is not for query molecules. The IO option was deprecated and valency is now always written.

    As Rich describes in the blog post the valence field shows the number of bonds including implicit hydrogens. I've added the setting of implicit hydrogens once the MDL file is read.

    I put it on master as I deprecated an IO option on the MDLWriter - plus the implicit hydrogen inference is a bit more then a bug fix. Plus I think I have a couple of MDL patches waiting on 1.4 which could cause merge conflicts.

    https://github.com/johnmay/cdk/tree/feature/mdl-valence

    Sent from sourceforge.net because cdk-patches@lists.sourceforge.net is subscribed to https://sourceforge.net/p/cdk/patches/

    To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cdk/admin/patches/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.


    Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
    Discover the easy way to master current and previous Microsoft technologies
    and advance your career. Get an incredible 1,500+ hours of step-by-step
    tutorial videos with LearnDevNow. Subscribe today and save!
    http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk_______
    Cdk-patches mailing list
    Cdk-patches@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/cdk-patches

     

    Related

    Patches: #582

    • status: open --> closed
    • Group: Needs_Review --> Accepted
     
  • Applied and pushed.