#1237 smiles are not canonical

cdk-1.4.x
closed
5
2013-06-03
2012-06-14
Till Schäfer
No

Bug 3414473 is reproducable on linux 64 with icedtea 6 and 7 and the current 1.4.x git branch. I opened a new bug because commenting on the other bug gives me errors.

we had different smiles for identical scaffolds in the software scaffold hunter. therefore if found this bug.

the structures are:

from junit test: C1CCC2C[CC=]CC2(C1)


10 11 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
6 7 1 0 0 0 0
7 8 2 3 0 0 0
8 9 1 0 0 0 0
9 4 1 0 0 0 0
9 10 1 0 0 0 0
10 1 1 0 0 0 0
M END

from junit test: C1CCC2C[=CC]CC2(C1)


10 11 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
6 1 1 0 0 0 0
6 7 1 0 0 0 0
7 8 1 0 0 0 0
8 9 1 0 0 0 0
10 5 1 0 0 0 0
10 9 2 0 0 0 0
M END

Discussion

  • John May
    John May
    2012-12-03

    Probably a mix of two problems - the canonical labeller needs more initial invariance and the comparators were overflowing (see. patch:593).

     
  • John May
    John May
    2013-05-23

    formatted molecules:

      Mrv0541 05231309172D          
    
     10 11  0  0  0  0            999 V2000
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0  
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
      1  2  1  0  0  0  0 
      2  3  1  0  0  0  0 
      3  4  1  0  0  0  0 
      4  5  1  0  0  0  0 
      5  6  1  0  0  0  0 
      6  7  1  0  0  0  0 
      7  8  2  3  0  0  0 
      8  9  1  0  0  0  0 
      9  4  1  0  0  0  0 
      9 10  1  0  0  0  0 
     10  1  1  0  0  0  0 
    M  END
    $$$$
    
      Mrv0541 05231309172D          
    
     10 11  0  0  0  0            999 V2000
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0  
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
       0.0000     0.0000    0.0000 C  0  0  0  0  0  0  0  0  0  0  0  0 
      1  2  1  0  0  0  0 
      2  3  1  0  0  0  0 
      3  4  1  0  0  0  0 
      4  5  1  0  0  0  0 
      5  6  1  0  0  0  0 
      6  1  1  0  0  0  0 
      6  7  1  0  0  0  0 
      7  8  1  0  0  0  0 
      8  9  1  0  0  0  0 
     10  5  1  0  0  0  0 
     10  9  2  0  0  0  0 
    M  END
    
     
  • John May
    John May
    2013-05-23

    Perhaps the 1.7 comparator patches fixed this a bit - running without implicit hydrogens added:

    C1CCC2CCC=CC2(C1)
    C1CCC2C=CCCC2(C1)
    1 2 6 10 8 4 3 7 9 5 
    5 1 2 6 10 9 7 3 4 8
    

    Running with implicit hydrogens added

    C1=CC2CCCCC2(CC1)
    C1=CC2CCCCC2(CC1)
    4 5 8 10 6 3 1 2 9 7 
    8 5 4 7 9 10 6 3 1 2
    

    However the molecules only differ on their atom order and regardless of the hydrogens the canonical forms should be the same.

     
    Last edit: John May 2013-05-23
  • John May
    John May
    2013-05-23

    Okay, I remember now - the issue is that the labeller does not considered bond order. The difference is implied by the number of hydrogens each atom has. Similar principle to the hybridization fingerprinter. Not sure whether to close this or not? Would be good to add a unit test showing it works but that might have been done already.

     
  • John May
    John May
    2013-06-03

    • status: open --> closed
     
  • John May
    John May
    2013-06-03

    Closing - not a bug - however hydrogens will soon be configured with atom type in the manipulator.