#351 suggested fix for bug 3310138

master (162)

The Tanimoto calcualtion for raw fingerprints was broken. This is an implementation of the continous tanimoto score used in DOI: 10.1021/ci800326z


  • Egon Willighagen

    A quick comment:

    1. (minor) please use separate commits for separate fixes, allowing you to give more detail, e.g. on to what was wrong with the @TestMethod
    2. please use {@cdk.cite} in the method JavaDoc rather than just listing the DOI. See for example the CMLReader class

    As to the change of algorithm, the original author should comment on that.

  • Jonathan Alvarsson

    I sort of see your point but when choosing between something which is published and something which is found on a webpage I must say I am leaning towards the published one...

    The poster on Shapado mentions using Tanimoto in another answer on that page but it is unclear if it is the very same one or another, also a quick skim of that publication did not reveal any Tanimoto equation (maybe I didn't look closely enough?) Since I do not have enough points to comment on that shapado thing I can not ask the poster... :(

    However, I have now collected three different ways of doing this and I am beginning to wonder if maybe CDK should implement all of them? I mean who are we to make the choice for the user? What's the general CDK philosophy in these cases?

  • Egon Willighagen

    I am beginning to wonder if maybe CDK should implement all of them?

    Yes, please! The CDK philosophy is in fact to provide alternative algorithms, unless it makes absolutely no sense (e.g. when the algorithm is of zero interest, like having two 2D layout engines).

    An example where it does make sense, is where alternative implementations have education value, e.g. with SSSR algorithms. The CDK has two now, and a third is being developed.

    I am looking forward to your analysis of the three math equations / algorihms for calculating the Tanimoto distance! And to their matching CDK implementations too!

  • Egon Willighagen

    An interesting plot would be to compare the Tanimoto distances for one algorithm to another algorithm. Are the about the same? Does either algorithm consequently predict larger/smaller distances? Is any difference linear over the full range of [0,1]? If two algorithms indeed give different values (which I assume you mean; if they do not, the fastest is the only interesting one. that said, the same equation with a slow and a fast algorithm does have educational value too), then this is important to know, as it matters for selecting the more appropriate statistical method (as explained in my thesis).

  • Rajarshi Guha

    Rajarshi Guha - 2011-06-13

    OK, then this patch should go in

  • Jonathan Alvarsson

    I had another look at the equation given on that shapado site. I don't wee how to apply that to count fingerprints. It looks like it is for bit fingerprints to me.

  • Egon Willighagen

    Jonathan, what should happen with this patch?

  • Jonathan Alvarsson

    The attached patch is broken. The fix exists in other location. Closing this one.


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks