46 element-level patches. Each patch either promotes an existing atom type to a new method, or adds a new method for an element that wasn't already handled.
I have added a pdf file which contains information about atom types which were fixed/updated. It also contains remark(s) and reference molecules for each atom type for your perusal.
An initial few atom types from this patch already made it into CDK 1.4.1. If you want to help review the patches, please do so. But as soon as you finished one atom type, please send the signed-off patch to me and Rajarshi directly, so that it can immediately be applied.
I replied someone earlier today:
what is the status of Gilleain's reformatted atom type patch. I know you have more than enough to do.
Several have been applied already. Last week I had travel days, and
this week is catching up with work... See Jul 23:
The patches should be applied to the cdk-1.4.x branch with 'git am -3
--ignore-whitespace' ... skill those that do not immediately work...
some will cleanly apply, others need a bit of work.
Then check the patch with 'git show' and check if you believe the atom
type is reasonable... also check if the number of lone pairs and
double bonds are defined (they must have a good excuse not to list
those, as we need that for various CDK algorithms). And do a general
peer review of the code.
Then, make sure nothing gets broken, by running:
ant clean dist-all test-dist-all
ant -Dmodule=core test-module
Particularly focus on the unit test.
If you are happy with all, run 'git commit --amend --signoff', and
create a new patch with 'git format-patch -1' and send that
0001-.patch file to Rajarshi and* me by email.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nimish replied about the Cl.2 atom type I asked about ("I'll reject Cl for now. Please check the hybridization, but sp3 cannot contain two sigma bonds, a pi bond, and two lone pairs."):
I went through the query related to the Atom type Cl.2
Actually Cl central atom in Chlorite ion ClOO- has only 10 electrons
(neutral) while the case of MolBase (Cl atom has negative charge)
hence 12 electrons which is completely different atom type.
I reviewed and approved this afternoon Ag, Au, Ba, Be, and Cr.
The aluminum patches needs checking as it currently changes properties of other, existing atom types. Accidental, on purpose? If latter, split out and file a bug report.
Hg doesn't look good at this moment either. It has this code:
if (neighbors == 1) {
IAtomType type = getAtomType("Hg.plus");
if (isAcceptable(atom, atomContainer, type)) return type;
}
IAtomType type = getAtomType("Hg.plus");
if (isAcceptable(atom, atomContainer, type)) return type;
So, if it has one connection or more, it's the same Hg.plus atom type. That's not right.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Al: the patch removes properties for existing atom types
Ru: not the right formal charge in one of the tests (but I also question a Ru10+ atom type in general... are you kidding?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Asad, I have added a unit test for a compound with an S atom type like in the regression caused by the S atom type patch. The unit tests has @cdk.inchi annotation which you can use to find the structure in at least PubChem. See:
I like some further examples of the Br atom type. It looks pretty exotic to me, and neither PubChem nor ChemSpider have even a single compound with this atom type.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Please update the vanadium atom type patch to:
- hook in the perception method
- look and correct the hybridization state. sp3 does not sounds correct for four double bonded neighbors
Also please update the Zn atom type patch which is currently removing properties from an existing atom type:
BTW, I count 33/46 patches applied, with 3 updated patch sets for Ca, Fe, and Hg by Asad, so there should be 12 atom types to be revised. A revised Se in being discussed.
From below I see in need of revision (from the top): V, Zn, Br, S (see the additional unit test I submitted this weekend, which the patch breaks), Ru, N, and Cl (need a confirm in an existing compound for an atom type).
That means, that I may not have reviewed 2 atom types yet...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, the .zip file only contained 45 patches, and the 12 I listed was counted from 48. That leaves one atom type not being reviewed yet. If not mistaken, that was Al which I'll do now.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ah, I had reviewed it already, marked is as needing revision, but apparently forgot to write up what I like to see revised. Main problems I have with this patch is that it removed properties from existing Al atom types, but also wonder if the new Al.3minus atom type does not simply have zero lone pairs... if that is the case, please add that property. I am not sure about the hyridization type, but for many other atom types from that set, we can (and must) figure that out at a later stage.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
all 46 patches in a zip
Hi,
I have added a pdf file which contains information about atom types which were fixed/updated. It also contains remark(s) and reference molecules for each atom type for your perusal.
https://github.com/downloads/asad/cdk/Curated%20Atom%20type%20patches.pdf
Hope this helps.
Many thanks
Asad
An initial few atom types from this patch already made it into CDK 1.4.1. If you want to help review the patches, please do so. But as soon as you finished one atom type, please send the signed-off patch to me and Rajarshi directly, so that it can immediately be applied.
I replied someone earlier today:
Several have been applied already. Last week I had travel days, and
this week is catching up with work... See Jul 23:
https://github.com/cdk/cdk/commits/cdk-1.4.x
Make sure you're local version of the cdk-1.4.x branch is up to date.
Yes, see below.
Absolutely! (You're the first to ask...)
The zip file with patches can be found here:
http://sourceforge.net/tracker/?func=detail&aid=3374133&group_id=20024&atid=320024
The patches should be applied to the cdk-1.4.x branch with 'git am -3
--ignore-whitespace' ... skill those that do not immediately work...
some will cleanly apply, others need a bit of work.
Then check the patch with 'git show' and check if you believe the atom
type is reasonable... also check if the number of lone pairs and
double bonds are defined (they must have a good excuse not to list
those, as we need that for various CDK algorithms). And do a general
peer review of the code.
Then, make sure nothing gets broken, by running:
ant clean dist-all test-dist-all
ant -Dmodule=core test-module
Particularly focus on the unit test.
If you are happy with all, run 'git commit --amend --signoff', and
create a new patch with 'git format-patch -1' and send that
0001-.patch file to Rajarshi and* me by email.
I wrote on July 23 to Gilleain:
I get two failing unit tests in CDKAtomTypeMatcherTest.
All three applied
Doing these later.
Applied
OK, the history is now so different from what is in the patch, this
patch no longer applies :(
Not looked at yet.
I is approved.
I'll reject Cl for now. Please check the hybridization, but sp3 cannot
contain two sigma bonds, a pi bond, and two lone pairs.
OK, I just reviewed five atom types: In, Cd, Pu, Th look fine to me, though we really must try to figure out the lone pairs for those atoms.
Fe failed for me, causing two failing unit tests. That one I will need to look at later again.
Nimish replied about the Cl.2 atom type I asked about ("I'll reject Cl for now. Please check the hybridization, but sp3 cannot contain two sigma bonds, a pi bond, and two lone pairs."):
I went through the query related to the Atom type Cl.2
Actually Cl central atom in Chlorite ion ClOO- has only 10 electrons
(neutral) while the case of MolBase (Cl atom has negative charge)
hence 12 electrons which is completely different atom type.
Reference KEGG C01486.mol and http://www.chemspider.com/Molecular-Formula/ClNaO2
I'm good with this explanation.
I reviewed and approved this afternoon Ag, Au, Ba, Be, and Cr.
The aluminum patches needs checking as it currently changes properties of other, existing atom types. Accidental, on purpose? If latter, split out and file a bug report.
Hg doesn't look good at this moment either. It has this code:
So, if it has one connection or more, it's the same Hg.plus atom type. That's not right.
Cleanup is needed for these atom types:
Al: the patch removes properties for existing atom types
Ru: not the right formal charge in one of the tests (but I also question a Ru10+ atom type in general... are you kidding?)
N: needs more information on the atom types; the two new N atom types both violate the octet rule, which I cannot see why they should
S: causes a regression in the CDKAtomTypeMatcherSMILESTest
Okay I will check it.
Well I think the regression is valid. I am not sure if this molecule is valid.
I have fixed the Hg and the new patch is in the tracker.
Asad, then this compound should be discussed on the -devel or -user mailing list and consensus reached first.
Ru: not the right formal charge in one of the tests (but I also question a
Ru10+ atom type in general... are you kidding?)
True, this looked strange to me as well at first instance but we have a molecule :-)
Try this: http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=656819
Ru: not the right formal charge in one of the tests (but I also question a
Ru10+ atom type in general... are you kidding?)
True, at the first instance this looked strange to me but we have a
molecule :-)
Try this: http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=656819
I can only find Ru+5 atoms in that structure with CID 656819
Sorry here is the updated link for Ru10+
http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?sid=854176
The Fe patch has two failing unit tests, one for the the unit test doesn't set the charge right; the other seems in the perception code instead.
Thanks for the review for the Fe atom types. I have added cleaned Fe patch..
ID: 3395585
Asad, I have added a unit test for a compound with an S atom type like in the regression caused by the S atom type patch. The unit tests has @cdk.inchi annotation which you can use to find the structure in at least PubChem. See:
https://sourceforge.net/tracker/?func=detail&aid=3403750&group_id=20024&atid=320024
Otherwise, I just reviewed approved: Cu, Pt, Mo, Sb, and Gd.
I have currently remaining in my todo list:
0001-B-final.patch
0001-Mg-final.patch
0001-Pb-final.patch
0001-Te-final.patch
0001-Tl-final.patch
0001-Zn-final.patch
0001-Br-final.patch
0001-Ni-final.patch
0001-Sr-final.patch
0001-Ti-final.patch
0001-V-final.patch
All others should be either approved or marked as 'need revision' in this tracker.
I like some further examples of the Br atom type. It looks pretty exotic to me, and neither PubChem nor ChemSpider have even a single compound with this atom type.
Please update the vanadium atom type patch to:
- hook in the perception method
- look and correct the hybridization state. sp3 does not sounds correct for four double bonded neighbors
Also please update the Zn atom type patch which is currently removing properties from an existing atom type:
<at:formalCharge>2</at:formalCharge>
<at:hasElement rdf:resource="&elem;Zn"/>
<at:formalNeighbourCount>0</at:formalNeighbourCount>
<at:piBondCount>0</at:piBondCount>
The atom types Mg, Tl, Pb, Sr, Ni, and Ti have been approved, though some needed a trivial change.
My left/ queue is now empty.
BTW, I count 33/46 patches applied, with 3 updated patch sets for Ca, Fe, and Hg by Asad, so there should be 12 atom types to be revised. A revised Se in being discussed.
From below I see in need of revision (from the top): V, Zn, Br, S (see the additional unit test I submitted this weekend, which the patch breaks), Ru, N, and Cl (need a confirm in an existing compound for an atom type).
That means, that I may not have reviewed 2 atom types yet...
OK, the .zip file only contained 45 patches, and the 12 I listed was counted from 48. That leaves one atom type not being reviewed yet. If not mistaken, that was Al which I'll do now.
Ah, I had reviewed it already, marked is as needing revision, but apparently forgot to write up what I like to see revised. Main problems I have with this patch is that it removed properties from existing Al atom types, but also wonder if the new Al.3minus atom type does not simply have zero lone pairs... if that is the case, please add that property. I am not sure about the hyridization type, but for many other atom types from that set, we can (and must) figure that out at a later stage.