From: Rajarshi G. <raj...@gm...> - 2010-10-29 17:41:01
|
Hi, the CDK has a decent collection of binary fingerprints. However we are missing other useful ones such as circular etc. I know that at least one group is working on a circular fingerprint, but more generally, has there any thoughts on how we can represent arbitrary fingerprints within the API? Using circular fingerprints as an example, the fingerprint is essentially a list of integer or string values. More generally, one could assume a fingerprint to be a sequence of strings or integers, each with an associated count. Note that, even the current hash based fingerprints can be represented in this format. While one can always convert the general key,value representation into a fixed length bit string, it's sometimes useful to directly work with the key,value pairs. Thus, one possible refactoring of IFingerprinter is to have the following methods BitSet calculate(IAtomContainer); BitSet calculate(IAtomContainer, int); HashSet<T,Integer> calculateRaw(IAtomContainer); For cases such as the StandardFingerpriner, there is no change to the current behavior. On the other hand, for a circular fingerprint, the first two methods could be used to convert the raw circular fingerprint to a desired bitstring, while one can always access the raw fingerprint with the third method. Comments welcome -- Rajarshi Guha NIH Chemical Genomics Center |