From: Andrew D. <da...@da...> - 2010-10-29 19:03:17
|
On Oct 29, 2010, at 12:39 PM, Nina Jeliazkova wrote: > MD5 can be used, it's known to be a good compromise between simplicity and well-distributed-ness. > > http://download.oracle.com/javase/1.4.2/docs/api/java/security/MessageDigest.html > > MessageDigest digest = java.security.MessageDigest.getInstance("MD5"); I'm not that familiar with the CDK part of this discussion, but I know a bit about hash functions. MD5 is a cryptographically strong hash function which generates a 128 bit value. These produce great hash values, but your bitset implementation might need only 64 or even 32 bits, so there's needless overhead. Plus, making the MD5 has more overhead than a simpler hash. One I've heard about from several sources is the "Hsieh hash", at http://www.azillionmonkeys.com/qed/hash.html Another is MurmurHash, at http://tanjent.livejournal.com/756623.html If you're making your own BitSet (instead of reusing an existing library) then perhaps one of these might be useful. Andrew da...@da... |