Re: [Rdkit-discuss] BulkTanimotoSimilarity
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2018-06-07 03:49:05
|
HI Jennifer, The sample code in the documentation is there to generate a distance matrix (which is what one needs for clustering). So it calculates the similarities with this line: sims = DataStructs.BulkTanimotoSimilarity(fps[i],fps[:i]) and then converts them to distances by subtracting from 1: dists.extend([1-x for x in sims]) If you want the similarity matrix, you'd just do: dists.extend(sims) Here's a little demo to show that BulkTanimotoSimilarity is actually returning similarities: In [3]: fp1 = Chem.RDKFingerprint(Chem.MolFromSmiles('CCc1ccccc1')) In [4]: fp2 = Chem.RDKFingerprint(Chem.MolFromSmiles('CCCc1ccccc1')) In [5]: DataStructs.TanimotoSimilarity(fp1,fp2) Out[5]: 0.7590361445783133 In [6]: DataStructs.BulkTanimotoSimilarity(fp1,(fp1,fp2)) Out[6]: [1.0, 0.7590361445783133] You can see that the values are what you'd expect. I hope this helps, -greg On Wed, Jun 6, 2018 at 12:26 PM Jennifer Hemmerich < jen...@gm...> wrote: > I was trying to calculate a Similarity Matrix with Morgan Fingerprints and > TanimotoSimilarity. > > If I use DataStructs.TanimotoSimilarity(fp,fp) I get a Simlarity of 1, > which I would expect. If I do the same with the > DataStructs.BulkTanimotoSimilarity(fps[i], fps[i]) i get a Similarity of 0, > which actually is not a Similarity but a distance. I took this from the > cookbook (http://www.rdkit.org/docs/Cookbook.html) which states at the > clustering example: > > # first generate the distance matrix: > dists = [] > nfps = len(fps) > for i in range(1,nfps): > sims = DataStructs.BulkTanimotoSimilarity(fps[i],fps[:i]) > dists.extend([1-x for x in sims]) > > Did I misunderstand something or is the dists list in the example > actually a list of Similarities, and the BulkTanimotoSimilarity actually > calculates the Tanimoto distance? > > It would be great to get some clarification. > > Thank you in advance, > > Jennifer > > > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |