Menu

#391 KEGG Atom type patches

Needs_Revision
closed
cdk-1.4.x (181)
5
2012-10-08
2011-07-21
No

46 element-level patches. Each patch either promotes an existing atom type to a new method, or adds a new method for an element that wasn't already handled.

Discussion

1 2 > >> (Page 1 of 2)
  • gilleain maclean torrance

    all 46 patches in a zip

     
  • Asad

    Asad - 2011-07-27

    Hi,

    I have added a pdf file which contains information about atom types which were fixed/updated. It also contains remark(s) and reference molecules for each atom type for your perusal.

    https://github.com/downloads/asad/cdk/Curated%20Atom%20type%20patches.pdf

    Hope this helps.

    Many thanks
    Asad

     
  • Egon Willighagen

    An initial few atom types from this patch already made it into CDK 1.4.1. If you want to help review the patches, please do so. But as soon as you finished one atom type, please send the signed-off patch to me and Rajarshi directly, so that it can immediately be applied.

    I replied someone earlier today:

    what is the status of Gilleain's reformatted atom type patch. I know you have more than enough to do.

    Several have been applied already. Last week I had travel days, and
    this week is catching up with work... See Jul 23:

    https://github.com/cdk/cdk/commits/cdk-1.4.x

    Make sure you're local version of the cdk-1.4.x branch is up to date.

    I understand that this has been reformatted into individual patches?

    Yes, see below.

    Can we help to expedite the inclusion?

    Absolutely! (You're the first to ask...)

    The zip file with patches can be found here:

    http://sourceforge.net/tracker/?func=detail&aid=3374133&group_id=20024&atid=320024

    The patches should be applied to the cdk-1.4.x branch with 'git am -3
    --ignore-whitespace' ... skill those that do not immediately work...
    some will cleanly apply, others need a bit of work.

    Then check the patch with 'git show' and check if you believe the atom
    type is reasonable... also check if the number of lone pairs and
    double bonds are defined (they must have a good excuse not to list
    those, as we need that for various CDK algorithms). And do a general
    peer review of the code.

    Then, make sure nothing gets broken, by running:

    ant clean dist-all test-dist-all
    ant -Dmodule=core test-module

    Particularly focus on the unit test.

    If you are happy with all, run 'git commit --amend --signoff', and
    create a new patch with 'git format-patch -1' and send that
    0001-.patch file to Rajarshi and* me by email.


     
  • Egon Willighagen

    I wrote on July 23 to Gilleain:

    existing : Se

    I get two failing unit tests in CDKAtomTypeMatcherTest.

    Mn Si As

    All three applied

    S N

    Doing these later.

    Na

    Applied

    Ca

    OK, the history is now so different from what is in the patch, this
    patch no longer applies :(

    Ge

    Not looked at yet.

    I Cl

    I is approved.

    I'll reject Cl for now. Please check the hybridization, but sp3 cannot
    contain two sigma bonds, a pi bond, and two lone pairs.

     
  • Egon Willighagen

    OK, I just reviewed five atom types: In, Cd, Pu, Th look fine to me, though we really must try to figure out the lone pairs for those atoms.

    Fe failed for me, causing two failing unit tests. That one I will need to look at later again.

     
  • Egon Willighagen

    Nimish replied about the Cl.2 atom type I asked about ("I'll reject Cl for now. Please check the hybridization, but sp3 cannot contain two sigma bonds, a pi bond, and two lone pairs."):


    I went through the query related to the Atom type Cl.2

    Actually Cl central atom in Chlorite ion ClOO- has only 10 electrons
    (neutral) while the case of MolBase (Cl atom has negative charge)
    hence 12 electrons which is completely different atom type.

    Reference KEGG C01486.mol and http://www.chemspider.com/Molecular-Formula/ClNaO2

    I'm good with this explanation.

     
  • Egon Willighagen

    I reviewed and approved this afternoon Ag, Au, Ba, Be, and Cr.

    The aluminum patches needs checking as it currently changes properties of other, existing atom types. Accidental, on purpose? If latter, split out and file a bug report.

    Hg doesn't look good at this moment either. It has this code:

    • if (neighbors == 1) {
    • IAtomType type = getAtomType("Hg.plus");
    • if (isAcceptable(atom, atomContainer, type)) return type;
    • }
    • IAtomType type = getAtomType("Hg.plus");
    • if (isAcceptable(atom, atomContainer, type)) return type;

    So, if it has one connection or more, it's the same Hg.plus atom type. That's not right.

     
  • Egon Willighagen

    Cleanup is needed for these atom types:

    Al: the patch removes properties for existing atom types
    Ru: not the right formal charge in one of the tests (but I also question a Ru10+ atom type in general... are you kidding?)

     
  • Egon Willighagen

    N: needs more information on the atom types; the two new N atom types both violate the octet rule, which I cannot see why they should

    S: causes a regression in the CDKAtomTypeMatcherSMILESTest

     
  • Asad

    Asad - 2011-08-17

    N: needs more information on the atom types; the two new N atom types both
    violate the octet rule, which I cannot see why they should

    Okay I will check it.

    S: causes a regression in the CDKAtomTypeMatcherSMILESTest

    Well I think the regression is valid. I am not sure if this molecule is valid.

     
  • Asad

    Asad - 2011-08-17

    Hg doesn't look good at this moment either. It has this code:

    I have fixed the Hg and the new patch is in the tracker.

     
  • Egon Willighagen

    Asad, then this compound should be discussed on the -devel or -user mailing list and consensus reached first.

     
  • Asad

    Asad - 2011-08-17

    Ru: not the right formal charge in one of the tests (but I also question a
    Ru10+ atom type in general... are you kidding?)

    True, this looked strange to me as well at first instance but we have a molecule :-)
    Try this: http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=656819

     
  • Asad

    Asad - 2011-08-17

    Ru: not the right formal charge in one of the tests (but I also question a
    Ru10+ atom type in general... are you kidding?)

    True, at the first instance this looked strange to me but we have a
    molecule :-)

    Try this: http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=656819

     
  • Egon Willighagen

    I can only find Ru+5 atoms in that structure with CID 656819

     
  • Egon Willighagen

    The Fe patch has two failing unit tests, one for the the unit test doesn't set the charge right; the other seems in the perception code instead.

     
  • Asad

    Asad - 2011-08-21

    Thanks for the review for the Fe atom types. I have added cleaned Fe patch..
    ID: 3395585

     
  • Egon Willighagen

    Asad, I have added a unit test for a compound with an S atom type like in the regression caused by the S atom type patch. The unit tests has @cdk.inchi annotation which you can use to find the structure in at least PubChem. See:

    https://sourceforge.net/tracker/?func=detail&aid=3403750&group_id=20024&atid=320024

     
  • Egon Willighagen

    Otherwise, I just reviewed approved: Cu, Pt, Mo, Sb, and Gd.

    I have currently remaining in my todo list:

    0001-B-final.patch
    0001-Mg-final.patch
    0001-Pb-final.patch
    0001-Te-final.patch
    0001-Tl-final.patch
    0001-Zn-final.patch
    0001-Br-final.patch
    0001-Ni-final.patch
    0001-Sr-final.patch
    0001-Ti-final.patch
    0001-V-final.patch

    All others should be either approved or marked as 'need revision' in this tracker.

     
  • Egon Willighagen

    I like some further examples of the Br atom type. It looks pretty exotic to me, and neither PubChem nor ChemSpider have even a single compound with this atom type.

     
  • Egon Willighagen

    Please update the vanadium atom type patch to:
    - hook in the perception method
    - look and correct the hybridization state. sp3 does not sounds correct for four double bonded neighbors

    Also please update the Zn atom type patch which is currently removing properties from an existing atom type:

    • <at:AtomType rdf:ID="Zn.2plus">
    • <at:AtomType rdf:ID="Zn.2plus">
      <at:formalCharge>2</at:formalCharge>
      <at:hasElement rdf:resource="&amp;elem;Zn"/>
      <at:formalNeighbourCount>0</at:formalNeighbourCount>
      <at:piBondCount>0</at:piBondCount>
    • <at:hybridization rdf:resource="&amp;at;s"/>
    • <at:lonePairCount>0</at:lonePairCount>
    • </at:AtomType>
    • </at:AtomType>

    The atom types Mg, Tl, Pb, Sr, Ni, and Ti have been approved, though some needed a trivial change.

    My left/ queue is now empty.

     
  • Egon Willighagen

    BTW, I count 33/46 patches applied, with 3 updated patch sets for Ca, Fe, and Hg by Asad, so there should be 12 atom types to be revised. A revised Se in being discussed.

    From below I see in need of revision (from the top): V, Zn, Br, S (see the additional unit test I submitted this weekend, which the patch breaks), Ru, N, and Cl (need a confirm in an existing compound for an atom type).

    That means, that I may not have reviewed 2 atom types yet...

     
  • Egon Willighagen

    OK, the .zip file only contained 45 patches, and the 12 I listed was counted from 48. That leaves one atom type not being reviewed yet. If not mistaken, that was Al which I'll do now.

     
  • Egon Willighagen

    Ah, I had reviewed it already, marked is as needing revision, but apparently forgot to write up what I like to see revised. Main problems I have with this patch is that it removed properties from existing Al atom types, but also wonder if the new Al.3minus atom type does not simply have zero lone pairs... if that is the case, please add that property. I am not sure about the hyridization type, but for many other atom types from that set, we can (and must) figure that out at a later stage.

     
1 2 > >> (Page 1 of 2)

Log in to post a comment.