Bug 3414473 is reproducable on linux 64 with icedtea 6 and 7 and the current 1.4.x git branch. I opened a new bug because commenting on the other bug gives me errors.
we had different smiles for identical scaffolds in the software scaffold hunter. therefore if found this bug.
the structures are:
from junit test: C1CCC2C[CC=]CC2(C1)
10 11 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
6 7 1 0 0 0 0
7 8 2 3 0 0 0
8 9 1 0 0 0 0
9 4 1 0 0 0 0
9 10 1 0 0 0 0
10 1 1 0 0 0 0
M END
from junit test: C1CCC2C[=CC]CC2(C1)
10 11 0 0 0 0 0 0 0 0999 V2000
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
2 3 1 0 0 0 0
3 4 1 0 0 0 0
4 5 1 0 0 0 0
5 6 1 0 0 0 0
6 1 1 0 0 0 0
6 7 1 0 0 0 0
7 8 1 0 0 0 0
8 9 1 0 0 0 0
10 5 1 0 0 0 0
10 9 2 0 0 0 0
M END
Probably a mix of two problems - the canonical labeller needs more initial invariance and the comparators were overflowing (see. patch:593).
formatted molecules:
Perhaps the 1.7 comparator patches fixed this a bit - running without implicit hydrogens added:
Running with implicit hydrogens added
However the molecules only differ on their atom order and regardless of the hydrogens the canonical forms should be the same.
Last edit: John May 2013-05-23
Okay, I remember now - the issue is that the labeller does not considered bond order. The difference is implied by the number of hydrogens each atom has. Similar principle to the hybridization fingerprinter. Not sure whether to close this or not? Would be good to add a unit test showing it works but that might have been done already.
Closing - not a bug - however hydrogens will soon be configured with atom type in the manipulator.