Thanks for posting this. Do you think it is possible to do a systemic testing for the contract violation error ? 

On 29 November 2012 12:50, John May <> wrote:
Hi all,

I'm going to post these on the bug tracker but I thought it would be good to share as the problems (could) affect a fair bit of code and uses.

1. The HOSECodeGenerator comparator is in contract violation [src]. This is a lot more subtle then the last one in the CanonicalLabeller - if either value is null the comparator returns '0' which is incorrect. This means if we have a=5, b=10 and c=null then a < b, b < c but a == c and b == c. Stephan and I managed to track it down and have written a patch.
2. HOSECodeGenerator calculates the canonical labels as many times as there atoms - it only needs to do it once. This means the algorithm is closer to O(n^2) - you need to determine 'n' invpairs 'n' times.
3. The canonical labeller doesn't handle connected atom count/exp/impl hydrogen count correctly. The code tries to separate the counts into 'total count', 'non-hydrogen-count' and 'hydrogen-count' but it does not check for explicit hydrogens [src]. This is even more subtle and was actually how we found the contract violation (i.e. if only happened when we added implicit hydrogens).

These should be quite simple to fix but I have a feeling the canonical labeller and hose codes are used a lot.

Canonical labeler as part of SMILES parsing is indeed used a lot.

Best regards,

Many Thanks,
me John May | Predoctoral Student – Chemoinformatics and Metabolism
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK | +44–(0) 1223 49 2603

Keep yourself connected to Go Parallel:
VERIFY Test and improve your parallel project with help from experts
and peers.
Cdk-devel mailing list