From: Marc C. <mar...@gm...> - 2011-03-29 22:46:14
|
Jurgens You should be aware that Tc=1 does not guarantee that two compounds are identical, only that they could be identical. Due to the finite fingerprint length used in the comparison it is possible that the same bits will be set for non-identical structures, especially if you fold the fingerprints. To say this another way - identical compounds must have the same fingerprints, but compounds with the same fingerprints are not necessarily identical. Marc On 29/03/2011, at 10:43 PM, Jurgens de Bruin <deb...@gm...> wrote: > Hi All, > > I do hope some light can be shed on the following... > > I have a .sdf file that contains 2483 molecules when I run the following command ">babel in.sdf out.sdf --unique" it finds 255 duplicated, which is possible. > > When I try and do the same using python code by calculating the Tanimoto coefficient between two compounds (Tc = 1 would indicate a duplicate) I don't find any duplicated. How is this possible? > Python code below: > > mport openbabel > import pybel > import csv > from pybel import * > > > def createFPS(): > > before = 0 > Phytochemicals = [] > > for phyto in readfile("sdf","./phyto3000.sdf"): > Phytochemical = {} > before += 1 > fps = phyto.calcfp() > Phytochemical["Name"] = phyto.title > Phytochemical["FPS"] = fps > Phytochemicals.append(Phytochemical) > > print "Phytochemicals in original sdf:",before > > return Phytochemicals > > > def fDuplicated(Phytochemicals): > > stop = len(Phytochemicals) > count = 0 > for x in range(0, stop): > for z in range(0, stop): > if x != z: > Tc = Phytochemicals[x]['FPS'] | Phytochemicals[z]['FPS'] > if Tc == 1: > print "Tc equalto 1" > count += 1 > > print "Total Tc equal to 1",count > > > Phytochemicals = createFPS() > fDuplicated(Phytochemicals) > > -- > Regards/Groete/Mit freundlichen Grüßen/recuerdos/meilleures salutations/ > distinti saluti/siong/duì yú/привет > > Jurgens de Bruin > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > OpenBabel-scripting mailing list > Ope...@li... > https://lists.sourceforge.net/lists/listinfo/openbabel-scripting |