#247 CIP subset implementation

Needs_Review
closed
nobody
cdk-1.4.x (181)
5
2012-10-28
2010-08-12
No

Based on the recent ITetrahedralChirality work, I have implemented a patch to determine R,S based on a subset of the CIP rules.

Aimed at master, review very much appreciated.

There are four methods that are reported not tested, caused by a limitation of the coverage testing:

3033072 CoverageAnnotationTest gives false positives on compare()

The JavaDoc report shows a lot of false positives in the spell checking (that Jazzy patch should be removed, but accidentally sneaked in with another patch), but also a few false positives due to CDK checks happening on inner classes:

3043084 OJDC checks should not run on inner classes

Discussion

  • Egon Willighagen

    Gilleain found an issue in my code, fixed by the patches 0002 and 0003.

     
  • Egon Willighagen

    Rebased the patch on cdk-1.4.x.

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-10-24

    In CIPTool, getCIPChirality takes a LigancyFourChirality but another version of the method takes a ITetrahedralChirality object. What is the relation between these two classes?

    In CIPTool, checkIfAllLigandsAreDifferent seems to be incorrect - it compares elements 0,1 then 1,2, then 2,3 - but should elements 0,2 and 0,3 also be compared? In general, Id' think you need to do a full pairwise comparison here

    In CIPTool, in order(ILigand[]_) - I don't see any ordering, just an array copy. Also where does the cipRule object come from?

    Is ImmutableHydrogen meant to be called by a user of the library? if not, should be protected. Also, since it's basically a specialized atom, could it not simply subclass Atom?

    Why do you need a class TerminalLigand that subclasses Ligand and doesn't do anything extra over the base class?

    Does AtomicNumberRule need to be public? Can it be made protected? In general I think the rule classes should not be part of the public API

    It'd be useful to rename ISequenceSubRule to IRuleComparator or something similar to indicate that its purpose is to compare two ligands. Come to think of it, do you actually need a whole new interface for this? It doesn't look like its adding much beyond whats in Comparator

    Overall seems to be OK - I'd suggest that any class that is not meant to tbe public be made into a protected class. Also, has any code been updated to make use of the CIP support (such as SMILES parsing)? Didn't see that anywhere. Otherwise, it'd be useful to indicate how this code is meant to tbe used in general

     
  • Egon Willighagen

    In CIPTool, getCIPChirality takes a LigancyFourChirality but another
    version of the method takes a ITetrahedralChirality object. What is the
    relation between these two classes?

    The CIP variant knows about recursion. The ITetrahedralChirality only knows the neighboring atoms.

    In CIPTool, checkIfAllLigandsAreDifferent seems to be incorrect - it
    compares elements 0,1 then 1,2, then 2,3 - but should elements 0,2 and 0,3
    also be compared? In general, Id' think you need to do a full pairwise
    comparison here

    The input is sorted already. This method basically checks if all entries
    are different. The idea here is that if they are not all different, the search
    can stop.

    To clarify the sorted input, I added this to the JavaDoc: "It assumes that the input
    is sorted based on that rule."

    In CIPTool, in order(ILigand[]_) - I don't see any ordering, just an array
    copy. Also where does the cipRule object come from?

    There should be a line: Arrays.sort(newLigands, cipRule);

    The cipRule is a static variable in the CIPTool class:

    private static CIPLigandRule cipRule = new CIPLigandRule();

    Is ImmutableHydrogen meant to be called by a user of the library? if not,
    should be protected. Also, since it's basically a specialized atom, could
    it not simply subclass Atom?

    Why do you need a class TerminalLigand that subclasses Ligand and doesn't
    do anything extra over the base class?

    This is used to as trigger to stop recursing, via CIPTool.getLigandLigands():

    if (ligand instanceof TerminalLigand) return new ILigand[0];

    Does AtomicNumberRule need to be public? Can it be made protected? In
    general I think the rule classes should not be part of the public API

    I was not able to make all package protected, because of some classes being used in the unit tests, but these are now package private:

    org/openscience/cdk/geometry/cip/ImmutableHydrogen.java
    org/openscience/cdk/geometry/cip/LigancyFourChirality.java
    org/openscience/cdk/geometry/cip/TerminalLigand.java
    org/openscience/cdk/geometry/cip/rules/AtomicNumberRule.java
    openscience/cdk/geometry/cip/rules/CombinedAtomicMassNumberRule.java
    org/openscience/cdk/geometry/cip/rules/MassNumberRule.java

    It'd be useful to rename ISequenceSubRule to IRuleComparator or something
    similar to indicate that its purpose is to compare two ligands.

    Well, the class implements the CIP sequence sub rule. It's domain terminology,
    and perhaps not too obvious to those not into CIP, but if you are OK, I like to
    keep the name as is. OK?

    Come to
    think of it, do you actually need a whole new interface for this? It
    doesn't look like its adding much beyond whats in Comparator

    Except that typing into Comparator<ILigand>... it helps me keep the code clean.
    I also adds some semantics, which I like, and room for documentation.

    Overall seems to be OK - I'd suggest that any class that is not meant to
    be public be made into a protected class.

    Partially done, as some classes are used in cip module unit tests, but in a
    different Java package... not sure how to overcome that...

    Also, has any code been updated to make use of the CIP
    support (such as SMILES parsing)? Didn't see that anywhere.

    SMILES parsing is a bad example, as SMILES does not hold R or S
    facts (instead it uses @ and @@) :)

    But, currently, there is no other CDK code using it, but we have
    applications in Bioclipse where it is used.

    Otherwise, it'd be useful to indicate how this code is meant to
    be used in general

    Added this JavaDoc to CIPTool:

    • Basic use starts from a {@link ITetrahedralChirality} and therefore

    • assumes atoms with four neighbors:
    • IAtom[] ligandAtoms =
    • mol.getConnectedAtomsList(centralAtom).toArray(new IAtom[4]);
    • ITetrahedralChirality tetraStereo = new TetrahedralChirality(
    • centralAtom, ligandAtoms, Stereo.ANTI_CLOCKWISE
    • );
    • CIP_CHIRALITY cipChirality = CIPTool.getCIPChirality(mol, tetraStereo);
    • The {@link org.openscience.cdk.interfaces.IBond.Stereo} value can be
    • reconstructed from 3D coordinates with the {@link StereoTool}.

    The new patches also contain further work following bug reports:

    0004-Added-unit-test-to-make-sure-no-chirality-is-detecte.patch

    Tests that for a non-chiral atom, no R or S is detected.

    0005-Added-unit-tests-to-uncover-a-non-termination-issue-.patch
    0006-Keep-track-of-which-atoms-have-already-been-visited-.patch
    0013-Added-missing-JavaDoc-and-unit-tests-for-VisitedAtom.patch
    0007-Only-consider-recursion-if-all-ligand-atoms-are-iden.patch

    Added recursion termination fixes. The 0006 patch is somewhat larger and
    adds a infrastructure to keep track of which atoms are already visited, to
    solve the problem with recursion for some ring systems.

    0008-Added-testing-of-bromochlorofluoroiodomethane.patch
    0009-Added-unit-test-for-aromatic-molecular-signatures.patch

    More unit tests.

    Patch 0010-0012 are in reply to your above comments, and already mentioned in my replies.

    Patch 0001-0003 have been rebased on the current cdk-1.2.4.

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-12-04

    I think your comments are fine. Does the github branch contain all the patches ( so I don't need to apply on the top of the previous patches)?

     
  • Egon Willighagen

    Yes, the branch has all.

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-12-04

    Also, why is this for 1.4 when it is a totally new API?

     
  • Egon Willighagen

    Because the patch was submitted before the branch was made...

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-12-04

    oh ok

     
  • Egon Willighagen

    Actually... even if after, it would have been fine... we allow new APIs in stable branches too... but no changed APIs...

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-12-04

    OK, looks good as far as I can see. I do see one commit for molecular signatures - not sure what that's doing there, but I'll go ahead and pull into 1.4

     
  • Rajarshi Guha

    Rajarshi Guha - 2010-12-04

    Oh one other (minor) thing - CIP_CHIRALITY is a cumbersome type identifier. Why not use CipChirality?

    I won't hold the pull for this, but maybe you could do a rename later on?

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks