#540 SP Fingerprint

master (162)


  • John May

    John May - 2012-08-30

    Hi Asad,

    It looks good I'm just trying to run the test (having problems making sure all deps are in place). I few a couple of comments/questions though (yay for sourceforge's inability to nicely add code to comments).


    StringBuffer/Synchronised Collections
    Maybe I missed something but I couldn't see why you need to use StringBuffer opposed to builder (which you've used in other cases). The SimpleAtomCanonicalisation also uses Collections.synchronizedCollection(). The overhead of synchronising isn't worth it unless the collections are explicitly going to be shared between threads. In this case the SimpleAtomCanonicalisation is only used in a single method (and can only be called in a single thread). As I said maybe I missed something but I couldn't see a use synchronisation.

    Could you add some documentation to the top/methods. Possibly explain why it's needed and where it's being used.

    AtomTyping/Aromaticity Detection
    I'm not sure whether performing the atom typing/aromaticity detection within the bitset is a good idea. This is more a problem with having to explicitly type atoms - in fact I had an idea on how to do an AutoTypingAtomContainer but that's for another day. The atom typing method is relatively slow and forcing the atoms to be typed every time you want to calculate the fingerprint isn't necessary if they're already typed. As the typing only needs to be done when you load the container (or change it) and the "role" of this class is to generate a fingerprint not type and generate a fingerprint - I would be tempted to remove that. If it really should stay then I would add a comment to the documentation that the method will re-type and detect aromaticity.

    I think adding commons-math is a good idea. Although it's only used for the number generation here it could be useful elsewhere and replace some methods and having used it before it's definitely a good inclusion.

    • ShortestPathFingerprinterTest should be in module test-fingerprint
    • commons-math*.jar needs to be added to test-fingerprint.libdepends
    • AtomComparator needs a cdk-module I think...

    Minor Coding Suggestions:

    - o1.getSymbol().compareToIgnoreCase(o2.getSymbol()) == 0 in the conditional can be replaced with equalsIgnoreCase()

    - generateMersenneTwisterRandomNumber casts an int to long (return type) then when it's used in shortest path fingerprinter it casts the long back to an int. I would make generateMersenneTwisterRandomNumber return an int instead.
    - generateMersenneTwisterRandomNumber(int maximum) is never used

    - I would make the RandomNumber a field as opposed to using it as the superclass. It also means that if you want to change the random number generation method (I normally use an XORShift) you can do so at a later date.

    Autoboxing: all instances of hash.intValue() and new Integer() can be removed. Autoboxing takes care of this and makes the code easier to read.

    StringBuilder can take primitives in it's append method:
    - radicalInformation.append("RAD: ".concat(String.valueOf(container.getSingleElectronCount()))); can be replaced with (also faster) radicalInformation.append("RAD: ").append(container.getSingleElectronCount());

    - This could be done after but ShortestPathFingerprinterTest should be added to MFingerprintTests


  • John May

    John May - 2012-08-30

    Urgh sourceforge's comment box locked at a tiny size, excuse my grammar in that last comment :p.

    The AtomComparator needs to be an inner class or in a file on it's own. The cdk module/deplibs etc wasn't picking it up in the jar so I had to move it to an inner class of the simple canonicalizer. I tried adding a module tag but I guess it only picks up one per class.

    Anyways the all new tests pass as expected.

    AbstractFixedLengthFingerprinterTest is throwing some null's around but I can't immediately see why....


  • John May

    John May - 2012-08-30

    Incorrect Subset

  • John May

    John May - 2012-08-30

    Just checking, would it be possible to remove netbeans project files in nbproject/

  • John May

    John May - 2012-08-30

    Okay looks good.

    As you've said the failed tests case seem to be due to the new fingerprinter accounting for stereo parity in which case the entire substructure doesn't match.

    Am happy to sign this off.

    Some niggles but these could be fixed later:

    I think the hashing would work just as good if you use the raw integer values. If you look at the hashCode() for an Integer it is.:

    public int hashCode() {
    return value;

    As the hash isn't anything special you could convert:

        for (Iterator<IAtomContainer> it = sssr.atomContainers().iterator(); it.hasNext();) {
            IAtomContainer ring = it.next();
            int toHashCode = String.valueOf(ring.getAtomCount()).hashCode();
            paths.add(patternIndex, toHashCode);

    into (see below for iterator removal):

        for (IAtomContainer ring : sssr.atomContainers()) {
            paths.add(patternIndex++, ring.getAtomCount());

    Say the atom count is 10. The hash code of "10".hashCode() is not 10 but it will always be the same as "10.hashCode()". You can therefore just use the integer 10. This could be the same for all numeric/enumerations. What really matters with the hash coding is that you combined correctly.

    public int compare(IAtom o1, IAtom o2) {
    if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) {
    throw new ClassCastException();

    As the method needs an IAtom (which is also an IChemObject) then the method will throw a class cast exception even without the if statement. Explicitly adding this means the internal byte code looks like this (not really but you ge the gist):

    if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) {
                throw new ClassCastException();
    if (!(o1 instanceof IChemObject) || !(o2 instanceof IChemObject)) {
                throw new ClassCastException();

    If this is just to check for nulls then != null is a lot faster. I myself use instanceof too much, see why it's bad here: http://www.javapractices.com/topic/TopicAction.do?Id=31

    For iterator can be replaced with foreach

    for (Iterator<IAtom> it = container.atoms().iterator(); it.hasNext();) {
    IAtom atom = it.next();

    is exactly the same as (but more code)

    for(IAtom atom : container.atoms()){
    for(IAtomContainer atom : sssr.containers){

  • Egon Willighagen

    OK, matching the requirements for master: one independent approval.

    I have additional comments, though (Do you prefer me to file them as bug reports too?):

    • missing JavaDoc for some public classes and methods
    • missing test classes for SimpleAtomCanonicalisation and RandomNumber
    • SimpleAtomCanonicalisation and RandomNumber should be part of the fingerprint module and not standard, where they are used
    • add the missing commons-math-**.jar.meta file

    the content has version, name, copyright/license info for the library, like this one for xpp3-1.1.4c.jar:

    Library=XML Pull Parser
    Copyright=(c) 2001 Extreme! Lab, Indiana University. All rights reserved.
    License=Indiana University Extreme! Lab Software License


Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.

No, thanks