#586 CanonicalLabeller Comparator

Accepted
closed
nobody
master
9
2012-12-06
2012-11-24
John May
No

A patch to fix a bug with the CanonicalLabeller on JRE 1.7. The current implementation was in violation of the comparison contract (i.e. didn't work). The patch fixes this and means the SMILESGenerator functions as expected on 1.7.

Discussion

  • John May
    John May
    2012-11-24

    • priority: 1 --> 9
     
  • ... and the patch?

     
  • John May
    John May
    2012-11-24

    Hmm... didn't attach

     
    Attachments
  • Okay, the signed-off patch is pushed to https://github.com/gilleain/cdk/tree/patch-586 or can provide it as a file attachment if easier.

    It doesn't introduce any regressions in cdk.standard or cdk.smiles - but it occurs to me that I am testing on an older JDK (I can't install java-7 sadly).

     
  • John May
    John May
    2012-11-24

    Cool, thanks.

    Why can't you install Java 7? I thought it was available for all platforms now.

    J

    On 24 Nov 2012, at 18:38, gilleain maclean torrance gilleain@users.sf.net wrote:

    Okay, the signed-off patch is pushed to https://github.com/gilleain/cdk/tree/patch-586 or can provide it as a file attachment if easier.

    It doesn't introduce any regressions in cdk.standard or cdk.smiles - but it occurs to me that I am testing on an older JDK (I can't install java-7 sadly).

    patches:586 CanonicalLabeller Comparator

    Status: open
    Labels: io canonicalisation
    Created: Sat Nov 24, 2012 12:41 PM UTC by John May
    Last Updated: Sat Nov 24, 2012 05:24 PM UTC
    Owner: nobody

    A patch to fix a bug with the CanonicalLabeller on JRE 1.7. The current implementation was in violation of the comparison contract (i.e. didn't work). The patch fixes this and means the SMILESGenerator functions as expected on 1.7.

    Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/cdk/patches/586/

    To unsubscribe from further messages, please visit https://sourceforge.net/auth/prefs/

     
  • John May
    John May
    2012-11-26

    • milestone: Needs_Review --> Accepted
     
    • labels: io, canonicalisation --> io, canonicalisation, smiles
     
  • John, can you explain this a bit more? I cannot oversee the implications... why is it a problem that it violates the API contract? When was the exception thrown and why? Because the comparing itself does not seem to do anything different...

    Is there really no chance of a unit test to simulate the exception? There may be no regressions, but the SMILES parsing is not extensively tested, and this patch is rather invasive... does this change the ordering of atoms in canonical SMILES?

     
  • John May
    John May
    2012-12-02

    Stephan added a unit test for the HOSECodeOne so you can test that on a Java 7 VM.

    You can use the 'subtraction' hack in comparators if you can guarantee two things. 1) you don't have negative value and 2) you don't have overflow. The first one in this case holds however the second one doesn't. As an example... what do you get if you do the follow

    Interger.MAX_VALUE - Interger.MIN_VALUE // A number twice the max value of integer
    

    Okay but we don't have negatives... but we do have long values. Hence we the difference between the two is sufficiently large the cast to an int will overflow to a negative value. This then can mean that 'a < b' and 'b < c' but 'a > c'. The code was always wrong just the sort didn't complain before.

    I can write a full example which breaks if you like...

     
  • I can write a full example which breaks if you like...

    Please do! It really helps people understand the code (and so did the above example)...

    In fact, I think we do have overflows, and I think this may explain in stability I sometimes feel we have in canonical SMILES... maybe this is due to this issue? Never had time to look into it in detail; all I know, that canonical SMILES are currently CDK release specific.

    I watched the two Java Puzzler episodes you sent me, and maybe this should use BigInteger instead of long... the numbers get really large very quickly... I have been seriously wondering if we have overflows for larger molecules...

     
  • John May
    John May
    2012-12-02

    Okay will write one now and attach it.

    The trouble with BigInterger is it's quite slow (compared to primitives). There is actually a better way to set up the initial invariants so they are a constant number of bits.

     
  • John May
    John May
    2012-12-02

    Okay here is an example - I used TreeSet as that doesn't use sort but still uses a comparator. I added two correct comparators as when you have primitives - you don't want to create objects just for comparison.

     
    Attachments
    • John May
      John May
      2012-12-02

      Oh - it's self contained so you can compile it with 'javac' and run it on the command line.

       
  • John, thanx for the further detail, in particular that Java Puzzler 65. BTW, there is absolutely no objection to add in references to literature in comments (or JavaDoc), such as that puzzle.

    Applied and pushed.

     
    • status: open --> closed
     
  • John May
    John May
    2012-12-06

    Heh, I actually found that after the patch :-). I knew the issue as with the molecule hash code you also have compare invariants and I had a similar bug.