Overall, looks good. It's a good idea to move beyond BitSet's and have an interface for fingerprints. Use cases in the Javadocs would be useful (e.g., IntArrayCountFingerprint needs more documentation ).

It would be handy to have high level description of the new classes and their relationships. For example it's not clear why IntArrayFingerprint implements IBitFingerprint. Javadocs would help here.

So at this point, more Javadocs would be required, and a high level description would be useful

Also, can you sync up with latest master so that I only see the fp related chnages?

On Mon, Jan 30, 2012 at 8:40 AM, Jonathan Alvarsson <jonathan.alvarsson@gmail.com> wrote:

I have been working with the fingerprint part of CDK lately and
introduced a signature fingerprinter. It is not hashed down to only
1024 but uses the entire integer space and comes both in a bit and a
count flavour. In order to support this I have introduced interfaces
for the fingerprint representations. I have one implementation based
on Bitset which behaves very much like how it works in CDK right now
and one sparse representation which only persists the set bits in an
array. (It is needed when keeping a few fingerprints in memory...)
There is also a similar sparse representation of a count fingerprint
based on one array with hashes and one with counts for that hash.

I would love to see this merged into CDK but my guess is that I am not
yet ready for this. I have rebased my branch on the latest master and
now I ask for some feedback. What is missing in order to bring this up
to CDK standards?

My code is available at:


Looking forward to your thoughts!

// Jonathan

Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
Cdk-devel mailing list

Rajarshi Guha
NIH Chemical Genomics Center