Re: [Rdkit-discuss] Topological Torsion Fingerprint - GetHashed
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2016-01-08 03:59:54
|
Dear Alex, It's a good question, but the answer has a small amount of complexity. If you use the function GetHashedTopologicalTorsionFingerprint(), you get back a count vector of the requested length where the torsions are hashed by combining their atom codes using a standard hashing algorithm (the RDKit uses an adapted version of boost::hash for this, here's some documentation about boost::hash http://www.boost.org/doc/libs/1_59_0/doc/html/hash.html). There's also a bit of logic in there to make sure that (C,2,1)-(C,2,1)-(C,3,1)-(C,3,0) and (C,3,0)-(C,3,1)-(C,2,1)-(C,2,1) generate the same hash. If you use GetHashedTopologicalTorsionFingerprintAsBitVect() with the nBitsPerEntry argument set to 1, more or less the same thing happens but you get back a bit vector instead of a count vector. Note that nBitsPerEntry is not, by default, set to 1. The default behavior for Topological Torsion and Atom Pair bit vector fingerprints is to simulate count-based fingerprints using the approach discussed on slides 21 and 22 of this presentation: http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf The basic idea is to generate a count-based FP of length nBits/nBitsPerEntry and then to use the approach described in the presentation to expand each of the counts there into nBitsPerEntry bits in the final fingerprint. For example a count of 3 stored in bin 12 of the short count-based FP would end up setting the bits 48 and 49 of the bit vector fingerprint (assuming nBitsPerEntry=4). A clearer version of this, perhaps with a picture probably should end up in the documentation. Another entry for my Todo list. :-) Does that help? -greg On Fri, Jan 8, 2016 at 4:20 AM, A neuman <the...@gm...> wrote: > Hello everyone, > > i want to describe how Fingerprints are calculated and get hashed into an > Bitstring of 1's and 0's. > My problem is the topological-torsion Fingerprint. > > In generel, due to the Fingerprints in the RDKit.pdf i know how it works. > > there is an example (C,2,1)-(C,2,1)-(C,3,1)-(C,3,0) > for on C with 2 bonds and one PI for (C,2,1) and so on... > > I'm using the GetHashedTopologicalTorsionFingerprintAsBitVect function to > get such a string, but i dont find any documentation, how it is get hashed? > In general, how are the ID's are calculated to set the "1" in the > bitstring. > > I know there is now GetTopologicalTorsionFingerprintAsBitVect function due > to the size. but with GetHashed it works. Does somebody know how? i simple > description would be realy helpful. > > I would appreciate your help > > best, > > alex > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > > |