#141 AtomContainerManipulator Functions

Needs_Revision
closed
nobody
None
5
2012-10-08
2009-11-29
Asad
No

I have written few extended functions for SMSD. I was wondering if these functions could be incorporated with the present AtomContainerManipulator.

Thanks

Asad

Discussion

  • Asad
    Asad
    2009-11-29

    Extended functions for AtomContainerManipulator

     
  • Rajarshi Guha
    Rajarshi Guha
    2009-11-29

    A few comments.

    1. Can you make this a patch - easier to apply and look at the differences with the current class.

    2. what does makeDeepCopy do that AtomContainer.clone() doesn't do? Can any extra functionality in your method be added to AtomContainer.clone()

    3. fix* look like they could be candidate methods for a Standardizer class

    4. What are differences between fixNitroGroups and fixNitroGroups2?

    5. How does aromatizeMolecule differ from CDKHueckelAromaticityDetector.perceiveAromaticity?

     
  • Asad
    Asad
    2009-11-29

    1) Yes I have assigned this as patch too
    2) makeDeepCopy make a deep copy of bonds and atoms too whereas AtomContainer.clone() is a shallow copy.
    3) I would really appreciate if few more functions can be added to Standardizer class from another attached class MoleculeSanityCheck.java. I am attaching this class too.

    public class MoleculeSanityCheck {

    /**
     * 
     * @param molecule
     * @return
     */
    public static IAtomContainer checkAndCleanMolecule(IAtomContainer molecule) {
        boolean isMarkush = false;
        for (IAtom atom : molecule.atoms()) {
            if (atom.getSymbol().equals("R")) {
                isMarkush = true;
                break;
            }
        }
    
        if (isMarkush) {
            System.err.println("Skipping Markush structure for sanity check");
        }
    
        // Check for salts and such
        if (!ConnectivityChecker.isConnected(molecule)) {
            // lets see if we have just two parts if so, we assume its a salt and just work
            // on the larger part. Ideally we should have a check to ensure that the smaller
            //  part is a metal/halogen etc.
            IMoleculeSet fragments = ConnectivityChecker.partitionIntoMolecules(molecule);
            if (fragments.getMoleculeCount() > 2) {
                System.err.println("More than 2 components. Skipped");
            } else {
                IMolecule frag1 = fragments.getMolecule(0);
                IMolecule frag2 = fragments.getMolecule(1);
                if (frag1.getAtomCount() > frag2.getAtomCount()) {
                    molecule = frag1;
                } else {
                    molecule = frag2;
                }
            }
        }
        fixAromaticity(molecule);
        return molecule;
    }
    
    /**
     *
     * @param mol
     */
    public static void fixAromaticity(IAtomContainer mol) {
        // need to find rings and aromaticity again since added H's
    
        IRingSet ringSet = null;
        try {
            AllRingsFinder arf = new AllRingsFinder();
            ringSet = arf.findAllRings(mol);
    
            // SSSRFinder s = new SSSRFinder(mol);
            // srs = s.findEssentialRings();
    
        } catch (Exception e) {
            e.printStackTrace();
        }
    
        try {
            // figure out which atoms are in aromatic rings:
            ExtAtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol);
            CDKHueckelAromaticityDetector.detectAromaticity(mol);
            // figure out which rings are aromatic:
            RingSetManipulator.markAromaticRings(ringSet);
            // figure out which simple (non cycles) rings are aromatic:
            // HueckelAromaticityDetector.detectAromaticity(mol, srs);
        } catch (Exception e) {
            e.printStackTrace();
        }
    
        // only atoms in 6 membered rings are aromatic
        // determine largest ring that each atom is a part of
    
        for (int i = 0; i <= mol.getAtomCount() - 1; i++) {
    
            mol.getAtom(i).setFlag(CDKConstants.ISAROMATIC, false);
    
            jloop:
            for (int j = 0; j <= ringSet.getAtomContainerCount() - 1; j++) {
                //logger.debug(i+"\t"+j);
                IRing ring = (IRing) ringSet.getAtomContainer(j);
                if (!ring.getFlag(CDKConstants.ISAROMATIC)) {
                    continue jloop;
                }
    
                boolean haveatom = ring.contains(mol.getAtom(i));
    
                //logger.debug("haveatom="+haveatom);
    
                if (haveatom && ring.getAtomCount() == 6) {
                    mol.getAtom(i).setFlag(CDKConstants.ISAROMATIC, true);
                }
    
            }
    
        }
    
    }
    

    }
    4) first one will fix changes in nitros given by N(=O)(=O) to N+[O-] and second one will fix changes nitros given by N+[O-] to N(=O)(=O) (these two function were present in CDK at some point)
    5) aromatizeMolecule is same as CDKHueckelAromaticityDetector Except that aromatizeMolecule depends upon percieveAtomTypesAndConfigureAtoms of this class. percieveAtomTypesAndConfigureAtoms handle NULL exception.

    IAtomType matched = matcher.findMatchingAtomType(container, atom);
    if (matched != null) {
    AtomTypeManipulator.configure(atom, matched);
    }

     
  • makeDeepCopy make a deep copy of bonds and atoms too whereas
    AtomContainer.clone() is a shallow copy

    Oh... but that can't be right... I'm pretty sure all IChemObject.clone() methods are supposed to make deep clones...

     
  • Asad
    Asad
    2009-11-30

    AtomContainer.clone() has this comment attached to it
    /
    * Clones this AtomContainer object and its content.


    * @return The cloned object
    * @see #shallowCopy

    /
    Secondly

    It doesn't set these functions

        newAtomContainer.setProperties(container.getProperties());
        newAtomContainer.setFlags(container.getFlags());
        newAtomContainer.setID(container.getID());
        newAtomContainer.notifyChanged();
    
     
  • I'd say that JavaDoc confirms that the method is supposed to make a deep clone.

    Flags, ID should be copied; that should be explored, and sounds like a bug.

    Notification listeners should not be copied, I think.

    Properties is just tricky, and no one ever tried to make an point about that... one problem is that properties may or may not be clonable, and it is typically impossible to deep clone them at all... so, currently properties are not cloned in ChemObject.clone()... this is certainly debateble... it has just never been discussed before.

    This could certainly be something that we can merge into mainline CDK... I would like to invite you to bring up your use case of cloning properties to the cdk-user mailing list, and see if we can come to a general solution...

     
  • Asad
    Asad
    2009-11-30

    I am not sure about the latest changes in the CDK AtomContainer clone method as I wrote deepcopy method because the clone atom container used to miss the atom IDs (i.e atom.getID use to be null) and some flags after using AtomContainer.clone .

    If flags are set (copied/cloned) and atom IDs are conserved then I would think my problem is solved!

     
  • Rajarshi Guha
    Rajarshi Guha
    2009-11-30

    I'd say that if things are missing, then AtomContainer.clone() needs some updating. A dicussion onlist would be useful

     
  • Stefan Kuhn
    Stefan Kuhn
    2009-12-04

    For fix*, have a look at org.openscience.cdk.tools.Normalizer class. This can do this with an xml configuration file.

     
  • I can only second Stefan's suggestion and to put the normalization code in a separate class...

    Since normalization is a critical step in many cheminformatics pipelines, I'd even suggest to create a separate java package for this. What about:

    org.openscience.cdk.normalize

    which can hold Normalizer, your code, and possibly other code too.

    I'm sure this is some preferences involved, and we should think about how to allow people to construct a normalizer...

    Stefan, Asad, what about a construct like:

    Normalizer normalizer = new Normalizer();
    normalizer.addRule(new FixNitrogenGroup());

    with FixNitrogenGroup extends INormalizationRule.

    Asad, just for your patch, please put the fix*() methods in a different class than the Manipulator. For example, keep them in your extending class for now.

     
  • Rajarshi Guha
    Rajarshi Guha
    2009-12-04

    I strongly second Stefan and Egon - a normalizer package with a single front end class would be great

     
  • Rajarshi Guha
    Rajarshi Guha
    2009-12-04

    I strongly second Stefan and Egon - a normalizer package with a single front end class would be great

     
  • Stefan Kuhn
    Stefan Kuhn
    2009-12-04

    Egon, your suggestion sounds good. My xml based normalizer was just a go, but something where rules are in java appeals much more to me (I like to have things in code).

     
  • Stefan, I think I missed that... where can I look at the code for that xml-based normalizer?

     
  • Stefan Kuhn
    Stefan Kuhn
    2009-12-05

    have a look at org.openscience.cdk.tools.Normalizer

     
  • Asad, please find attached a new patch of your class for master, now in cdk.normalize...

    But, it still needs:

    • unit tests
    • JavaDoc
    • split out some methods for AtomContainerManipulator
     
  • The updated version I attached fixes:

    • removed dependency on the 'data' module, and now only depends on 'interfaces'
    • List<IAtom> instead of just List
    • copyright header before package line
    • lower case first character for variable names
    • some removed empty lines

    But note the remaining problems in my other comment today.

     
  • This patch will have to be completely reworked now.