Re: [Rdkit-discuss] Topological Torsion Fingerprint - GetHashed

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Dear Alex,

It's a good question, but the answer has a small amount of complexity.

If you use the function GetHashedTopologicalTorsionFingerprint(), you get
back a count vector of the requested length where the torsions are hashed
by combining their atom codes using a standard hashing algorithm (the RDKit
uses an adapted version of boost::hash for this, here's some documentation
about boost::hash http://www.boost.org/doc/libs/1_59_0/doc/html/hash.html).
There's also a bit of logic in there to make sure that
(C,2,1)-(C,2,1)-(C,3,1)-(C,3,0) and (C,3,0)-(C,3,1)-(C,2,1)-(C,2,1)
generate the same hash.

If you use GetHashedTopologicalTorsionFingerprintAsBitVect() with the
nBitsPerEntry argument set to 1, more or less the same thing happens but
you get back a bit vector instead of a count vector. Note that
nBitsPerEntry is not, by default, set to 1.

The default behavior for Topological Torsion and Atom Pair bit vector
fingerprints is to simulate count-based fingerprints using the approach
discussed on slides 21 and 22 of this presentation:
http://rdkit.org/UGM/2012/Landrum_RDKit_UGM.Fingerprints.Final.pptx.pdf
The basic idea is to generate a count-based FP of length
nBits/nBitsPerEntry and then to use the approach described in the
presentation to expand each of the counts there into nBitsPerEntry bits in
the final fingerprint.

For example a count of 3 stored in bin 12 of the short count-based FP would
end up setting the bits 48 and 49 of the bit vector fingerprint (assuming
nBitsPerEntry=4).

A clearer version of this, perhaps with a picture probably should end up in
the documentation. Another entry for my Todo list. :-)

Does that help?
-greg

On Fri, Jan 8, 2016 at 4:20 AM, A neuman <the...@gm...> wrote:

> Hello everyone,
>
> i want to describe how Fingerprints are calculated and get hashed into an
> Bitstring of 1's and 0's.
> My problem is the topological-torsion Fingerprint.
>
> In generel, due to the Fingerprints in the RDKit.pdf i know how it works.
>
> there is an example (C,2,1)-(C,2,1)-(C,3,1)-(C,3,0)
> for on C with 2 bonds and one PI for (C,2,1) and so on...
>
> I'm using the GetHashedTopologicalTorsionFingerprintAsBitVect  function to
> get such a string, but i dont find any documentation, how it is get hashed?
> In general, how are the ID's are calculated to set the "1" in the
> bitstring.
>
> I know there is now GetTopologicalTorsionFingerprintAsBitVect function due
> to the size. but with GetHashed it works. Does somebody know how? i simple
> description would be realy helpful.
>
> I would appreciate your help
>
> best,
>
> alex
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Rdkit-discuss mailing list
> Rdk...@li...
> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss
>
>

Re: [Rdkit-discuss] Topological Torsion Fingerprint - GetHashed

Open-Source Cheminformatics and Machine Learning

Re: [Rdkit-discuss] Topological Torsion Fingerprint - GetHashed