#1277 Bond order 4 in MDLV2000 'breaks' AtomTyping

master
open
nobody
None
1
2012-11-13
2012-11-07
SBeisken
No

Currently the MDLV2000Reader uses IBond.Order.UNSET for bonds of order 4. This renders atom typing for the atoms of the UNSET bonds impossible because the maximum bond order will always exceed the theoretical pi bond count of the atom type (CDKAtomTypeMatcher 2544). This is a result of the ordinal() method used (container.getMaximumBondOrder(atom).ordinal()). The ordinal of UNSET is 4, greater than most pi bond counts of most common atom types.

In turn that means that all subsequent methods which rely on typed atoms fail, such as the FixBondOrderTool (which could be used to get around the issue of bond order 4).

One way around the problem is not to use ordinal. John has posted an alternative approach for 1.4.x (https://sourceforge.net/p/cdk/patches/569/). Adding a custom connections method to the enum would replace ordinal method calls and could return a lower value.

Discussion

  • Yes, .ordinal() must be phased out and indeed the atom typer should be updated.

    That's the thing with master... because of API changes, these kind of problems happen... just look at the number of regressions... getting all these things right, will take time, I'm afraid, and bug reports like these help a lot!

     
  • This issue probably also explains this regression in core:

    ------------- Standard Output ---------------
    Could not match atom: 4 in file bug1208740_1.mol
    Could not match atom: 5 in file bug1208740_1.mol
    Could not match atom: 6 in file bug1208740_1.mol
    Could not match atom: 8 in file bug1208740_1.mol
    Could not match atom: 9 in file bug1208740_1.mol
    Could not match atom: 11 in file bug1208740_1.mol
    Could not match atom: 12 in file bug1208740_1.mol
    Could not match atom: 13 in file bug1208740_1.mol
    Could not match atom: 14 in file bug1208740_1.mol
    Could not match atom: 15 in file bug1208740_1.mol
    Could not match atom: 16 in file bug1208740_1.mol
    Could not match atom: 17 in file bug1208740_1.mol
    Could not match atom: 18 in file bug1208740_1.mol
    Could not match atom: 19 in file bug1208740_1.mol
    Could not match atom: 20 in file bug1208740_1.mol
    Could not match atom: 1 in file bug771485-2.mol
    Could not match atom: 2 in file bug771485-2.mol
    Could not match atom: 3 in file bug771485-2.mol
    Could not match atom: 4 in file bug771485-2.mol
    Could not match atom: 5 in file bug771485-2.mol
    Could not match atom: 6 in file bug771485-2.mol
    Could not match atom: 7 in file bug771485-2.mol
    Could not match atom: 8 in file bug771485-2.mol
    Could not match atom: 9 in file bug771485-2.mol
    Could not match atom: 10 in file bug771485-2.mol
    Could not match atom: 1 in file ring_03419.mol
    Could not match atom: 2 in file ring_03419.mol
    Could not match atom: 3 in file ring_03419.mol
    Could not match atom: 4 in file ring_03419.mol
    Could not match atom: 5 in file ring_03419.mol
    Could not match atom: 6 in file ring_03419.mol
    Could not match atom: 7 in file ring_03419.mol
    Could not match atom: 8 in file ring_03419.mol
    Could not match atom: 9 in file ring_03419.mol
    Could not match atom: 10 in file ring_03419.mol
    Could not match atom: 11 in file ring_03419.mol
    Could not match atom: 12 in file ring_03419.mol
    Could not match atom: 13 in file ring_03419.mol
    Could not match atom: 14 in file ring_03419.mol
    Could not match atom: 15 in file ring_03419.mol
    Could not match atom: 16 in file ring_03419.mol
    Could not match atom: 17 in file ring_03419.mol
    Could not match atom: 18 in file ring_03419.mol
    Could not match atom: 19 in file ring_03419.mol
    Could not match atom: 20 in file ring_03419.mol
    Could not match atom: 21 in file ring_03419.mol
    Could not match atom: 22 in file ring_03419.mol
    Could not match atom: 23 in file ring_03419.mol
    Could not match atom: 24 in file ring_03419.mol
    Could not match atom: 25 in file ring_03419.mol
    Could not match atom: 27 in file ring_03419.mol
    Could not match atom: 28 in file ring_03419.mol
    Could not match atom: 29 in file ring_03419.mol
    Could not match atom: 31 in file ring_03419.mol
    Could not match atom: 32 in file ring_03419.mol
    ------------- ---------------- ---------------
    ------------- Standard Error -----------------
    [Fatal Error] :1:1: Premature end of file.
    ------------- ---------------- ---------------
    Testcase: testMDLMolfiles(org.openscience.cdk.atomtype.CDKAtomTypeMatcherTestFileReposTest): FAILED
    Could not match all atom types! expected:<2011> but was:<1956>
    junit.framework.AssertionFailedError: Could not match all atom types! expected:<2011> but was:<1956>
    at org.openscience.cdk.atomtype.CDKAtomTypeMatcherTestFileReposTest.testMDLMolfiles(CDKAtomTypeMatcherTestFileReposTest.java:220)

     
  • John May
    John May
    2012-11-07

    Aha!

    That's one I couldn't track down!

    J

    On 7 Nov 2012, at 20:10, Egon Willighagen egonw@users.sf.net wrote:

    This issue probably also explains this regression in core:

    ------------- Standard Output ---------------
    Could not match atom: 4 in file bug1208740_1.mol
    Could not match atom: 5 in file bug1208740_1.mol
    Could not match atom: 6 in file bug1208740_1.mol
    Could not match atom: 8 in file bug1208740_1.mol
    Could not match atom: 9 in file bug1208740_1.mol
    Could not match atom: 11 in file bug1208740_1.mol
    Could not match atom: 12 in file bug1208740_1.mol
    Could not match atom: 13 in file bug1208740_1.mol
    Could not match atom: 14 in file bug1208740_1.mol
    Could not match atom: 15 in file bug1208740_1.mol
    Could not match atom: 16 in file bug1208740_1.mol
    Could not match atom: 17 in file bug1208740_1.mol
    Could not match atom: 18 in file bug1208740_1.mol
    Could not match atom: 19 in file bug1208740_1.mol
    Could not match atom: 20 in file bug1208740_1.mol
    Could not match atom: 1 in file bug771485-2.mol
    Could not match atom: 2 in file bug771485-2.mol
    Could not match atom: 3 in file bug771485-2.mol
    Could not match atom: 4 in file bug771485-2.mol
    Could not match atom: 5 in file bug771485-2.mol
    Could not match atom: 6 in file bug771485-2.mol
    Could not match atom: 7 in file bug771485-2.mol
    Could not match atom: 8 in file bug771485-2.mol
    Could not match atom: 9 in file bug771485-2.mol
    Could not match atom: 10 in file bug771485-2.mol
    Could not match atom: 1 in file ring_03419.mol
    Could not match atom: 2 in file ring_03419.mol
    Could not match atom: 3 in file ring_03419.mol
    Could not match atom: 4 in file ring_03419.mol
    Could not match atom: 5 in file ring_03419.mol
    Could not match atom: 6 in file ring_03419.mol
    Could not match atom: 7 in file ring_03419.mol
    Could not match atom: 8 in file ring_03419.mol
    Could not match atom: 9 in file ring_03419.mol
    Could not match atom: 10 in file ring_03419.mol
    Could not match atom: 11 in file ring_03419.mol
    Could not match atom: 12 in file ring_03419.mol
    Could not match atom: 13 in file ring_03419.mol
    Could not match atom: 14 in file ring_03419.mol
    Could not match atom: 15 in file ring_03419.mol
    Could not match atom: 16 in file ring_03419.mol
    Could not match atom: 17 in file ring_03419.mol
    Could not match atom: 18 in file ring_03419.mol
    Could not match atom: 19 in file ring_03419.mol
    Could not match atom: 20 in file ring_03419.mol
    Could not match atom: 21 in file ring_03419.mol
    Could not match atom: 22 in file ring_03419.mol
    Could not match atom: 23 in file ring_03419.mol
    Could not match atom: 24 in file ring_03419.mol
    Could not match atom: 25 in file ring_03419.mol
    Could not match atom: 27 in file ring_03419.mol
    Could not match atom: 28 in file ring_03419.mol
    Could not match atom: 29 in file ring_03419.mol
    Could not match atom: 31 in file ring_03419.mol
    Could not match atom: 32 in file ring_03419.mol

    ------------- Standard Error -----------------
    Fatal Error :1:1: Premature end of file.

    Testcase: testMDLMolfiles(org.openscience.cdk.atomtype.CDKAtomTypeMatcherTestFileReposTest): FAILED
    Could not match all atom types! expected:<2011> but was:<1956>
    junit.framework.AssertionFailedError: Could not match all atom types! expected:<2011> but was:<1956>
    at org.openscience.cdk.atomtype.CDKAtomTypeMatcherTestFileReposTest.testMDLMolfiles(CDKAtomTypeMatcherTestFileReposTest.java:220)

    bugs:1277 Bond order 4 in MDLV2000 'breaks' AtomTyping

    Status: open Created: Wed Nov 07, 2012 09:15 AM UTC by SBeisken Last Updated: Wed Nov 07, 2012 10:16 AM UTC Owner: nobody

    Currently the MDLV2000Reader uses IBond.Order.UNSET for bonds of order 4. This renders atom typing for the atoms of the UNSET bonds impossible because the maximum bond order will always exceed the theoretical pi bond count of the atom type (CDKAtomTypeMatcher 2544). This is a result of the ordinal() method used (container.getMaximumBondOrder(atom).ordinal()). The ordinal of UNSET is 4, greater than most pi bond counts of most common atom types.

    In turn that means that all subsequent methods which rely on typed atoms fail, such as the FixBondOrderTool (which could be used to get around the issue of bond order 4).

    One way around the problem is not to use ordinal. John has posted an alternative approach for 1.4.x (https://sourceforge.net/p/cdk/patches/569/). Adding a custom connections method to the enum would replace ordinal method calls and could return a lower value.

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/bugs/1277/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/


    LogMeIn Central: Instant, anywhere, Remote PC access and management.
    Stay in control, update software, and manage PCs from one command center
    Diagnose problems and improve visibility into emerging IT issues
    Automate, monitor and manage. Do more in less time with Central
    http://p.sf.net/sfu/logmein12331_d2d_______
    Cdk-bugs mailing list
    Cdk-bugs@lists.sourceforge.net
    https://lists.sourceforge.net/lists/listinfo/cdk-bugs

     
  • SBeisken
    SBeisken
    2012-11-13

    Some follow up thoughts on the BondOrder.UNSET: Naturally the change will cause several challenges downstream. The readers (e.g., CML, SMILES) read aromaticity as BondOrder.SINGLE and set the corresponding flag (SINGLE_OR_DOUBLE). The readers/writers would have to be adapted to account for the special case of UNSET, otherwise IO would produce a variety of similar but not identical molecules. However, since UNSET can mean anything, we would have to check what exactly this UNSET bond is, i.e., if it is part of an aromatic system.

    My point is that if bondOrder.UNSET gets introduced, the whole project needs to be carefully adjusted for this otherwise it will be inconsistent. Those inconsistencies are difficult to detect in the unit tests provided but become apparent when the CDK library is used in complex pipelines for large datasets (I am using master in KNIME nightly).

    I believe UNSET would cause more issues than it solves at this point in time and wouldn't really recommend its implementation. That's my two pence on the situation. :)

     
  • Yeah, I guess IBond.Order.UNSET should be removed...

    I had discussion but it seems we never reached a decision... but we already have CDKConstants.UNSET which we can set to IBond.Order ...

    Then we can just track the NullPointerExceptions...

     
  • BTW, those errors I listed were unrelated...

     
  • John May
    John May
    2012-11-13

    I dunno, it would still be good to have though. Perhaps it should go in 'master's master' if that makes sense (obviously doesn't exist at the moment). It's nice that master is for API changes but unfortunately this Order.UNSET does break a lot.

    I understand (could be wrong) master was created for the great IMolecule purge. I'd say this change is probably even bigger as it changes the fundamentals of some of the algorithms. As you've mentioned we need to bring down the bugs in master before it's ~stable. Once master is stable and main development/bug fixing becomes cdk-1.5.x we could attempt the UNSET in a new master again?

    Thoughts, J?

     
  • No, 'master' is just where any API breaking happens. Anything more unstable should go into development branches. The regressions we see now is a mixture of fewer regressions caused by several separate things.

    I'm preparing a patch to remove IBond.Order.UNSET and running the tests now...

    When we freeze master (we stop changing the API) we branch cdk-1.5.x and make that stable. When that is stable, then we branch cdk-1.6.x ... that is, we follow the previous Linux kernel scheme 1.even is stable, 1.odd is unstable...