You might have a look on my (rather old) implementation of atom environments (reference inside the code)



It uses list of integers, and is wrapped as IMolecularDescriptor. I am fine with moving this code into cdk, if this is of interest.


On 29 October 2010 23:24, Egon Willighagen <egon.willighagen@gmail.com> wrote:
On Fri, Oct 29, 2010 at 7:40 PM, Rajarshi Guha <rajarshi.guha@gmail.com> wrote:
> Hi, the CDK has a decent collection of binary fingerprints. However we
> are missing other useful ones such as circular etc. I know that at
> least one group is working on a circular fingerprint, but more
> generally, has there any thoughts on how we can represent arbitrary
> fingerprints within the API?

Just for my sanity... this is the kind of fingerprint I used in this
post too, correct?


I'm not so very good at remembering exact terminology... but the
fingerprint I use in that post in PLS modeling is I think a circular
fingerprint... agreed?

> Using circular fingerprints as an
> example, the fingerprint is essentially a list of integer or string
> values. More generally, one could assume a fingerprint to be a
> sequence of strings or integers, each with an associated count. Note
> that, even the current hash based fingerprints can be represented in
> this format.

One important difference is that the length of this fingerprint is
undefined... and for every next molecule you calculate it for, the
fingerprint will grow. Moreover, the order of the molecules for which
they are calculated determine indirectly the meaning of the bits...


> While one can always convert the general key,value representation into
> a fixed length bit string, it's sometimes useful to directly work with
> the key,value pairs.

I had a glance over your patch, but I think I would like to have the
two type of fingerprints to have different interfaces, rather than to
have unsupported methods, and methods returning a -1 size...

> Thus, one possible refactoring of IFingerprinter
> is to have the following methods
> BitSet calculate(IAtomContainer);
> BitSet calculate(IAtomContainer, int);
> HashSet<T,Integer> calculateRaw(IAtomContainer);
> For cases such as the StandardFingerpriner, there is no change to the
> current behavior.



>From your other email:

On Fri, Oct 29, 2010 at 10:03 PM, Rajarshi Guha <rajarshi.guha@gmail.com> wrote:
> I have implemented the refactoring and an example of a class for
> non-binary fp (LINGO's) at http://github.com/rajarshi/cdk/tree/newfp

That sounds useful. I do not know these fingerprints yet, but the
support for circular fingerprints is certainly interesting... I would
likely implement the interface for the fingerprint described in that
blog post too (if no one beats me to it...)

My main argument for a separate interface would be that circular
fingerprints are strongly tied to the data set they are calculated

Here too, comments appreciated!


Dr E.L. Willighagen
Postdoctoral Research Associate
University of Cambridge
Homepage: http://egonw.github.com/
LinkedIn: http://se.linkedin.com/in/egonw
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

Nokia and AT&T present the 2010 Calling All Innovators-North America contest
Create new apps & games for the Nokia N8 for consumers in  U.S. and Canada
$10 million total in prizes - $4M cash, 500 devices, nearly $6M in marketing
Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store
Cdk-devel mailing list