Re: [Rdkit-discuss] How do rdFingerprintGenerator.GetMorganGenerator and AllChem.GetMorganFingerpr
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2019-07-10 04:39:15
|
Hi, By default the new fingerprint generators do "count simulation": adding extra bits to a bit vector fingerprint in order to get bit-vector similarities that are more similar to count-vector similarities. You can turn this off by passing the useCountSimulation=False argument to GetMorganGenerator(). Two comments about your sample code: 1) 256 bits is really not very many for a Morgan fingerprint. Maybe you were just using the small number for this question, but if you are really using fingerprints that short you should be aware that you are going to have a lot of collisions (blog post on this here: http://rdkit.blogspot.com/2016/02/colliding-bits-iii.html) 2) In case you aren't aware of it: you can calculate similarities and do fingerprint stats a lot more simply with builtin code like the GetNumOnBits() method on bit vectors and the similarity calculation code in rdkit.DataStructs. Take a look at DataStructs.DiceSimilarity() Hope this helps, -greg On Wed, Jul 10, 2019 at 3:53 AM Lewis Martin <lew...@gm...> wrote: > Hi all, > Quick question on truncated fingerprints, any help is really appreciated. > > > I think I've missed a trick on how the new fingerprint generator works. I > thought the below should produce equivalent fingerprints but they are > totally different. Has the implementation changed, or maybe I'm getting the > kwargs incorrect? See below code or this link for a quick visual: > https://github.com/ljmartin/snippets/blob/master/truncated_fingerprints.ipynb > Thanks ! > > import rdkit > from rdkit import Chem > from rdkit.Chem import Draw, AllChem > from rdkit.Chem import rdFingerprintGenerator > from rdkit.Chem.Draw import IPythonConsole > import numpy as np > from scipy.spatial import distance > > mol = Chem.MolFromSmiles('CN1C(=O)CN=C(C2=C1C=CC(=C2)Cl)C3=CC=CC=C3') > #diazepam > > gen_mo = rdFingerprintGenerator.GetMorganGenerator(fpSize=256, radius=2) > a = gen_mo.GetFingerprint(mol) > b = AllChem.GetMorganFingerprintAsBitVect(mol,2,256,useFeatures=False) > a_f = [int(i) for i in a.ToBitString()] > b_f = [int(i) for i in b.ToBitString()] > print('NumBits a: %s, NumBits b: %s' % (np.sum(a_f), np.sum(b_f))) > print('Dice Distance %s' % distance.dice(a_f,b_f)) > > > NumBits a: 47, NumBits b: 38 > Dice Distance 0.9058823529411765 > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |