Your patch seems to cause 16 regressions in the CDKAtomTypeMatcherTest after I applied it to cdk-1.4.x. Is it critical to apply to master instead of cdk-1.4.x?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Otherwise, I discussed the patch with Gilleain, in particular regarding splitting up the patch into smaller patches. Particularly, because the current patch also makes whitespace changes (don't make whitespace and {} changes in the same patch is functional changes), it is hard to see what is going on. For example, the patch feels like it changes atom type IDs? Why is that? That is going to break things...
Please also put one atom type in one unit test, rather than all in one test method, such as is now with e.g. the method fix_Fe().
This new test class should also be included in McoreTests.java.
And several atom types miss lone pair count and/or hybridization information.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
a) Old IDs in the ontology files were reverted as suggested.
b) New ids and the updated ontology was added
c) New mol files were added for 128 test case for the missing atom types
d) Test cases was synchronised with CDKAtomTypeMatcherTest.java (CDKAtomTypeMatcherTest.java extends CDKAtomTypeMatcherEBITest.java)
e) Hence AtomTypeMatcher was also updated.
Here are the test results from the result-core.txt
Testcase: test4Sulphur(org.openscience.cdk.atomtype.CDKAtomTypeMatcherSMILESTest): Caused an ERROR
null
java.lang.NullPointerException
at org.openscience.cdk.atomtype.CDKAtomTypeMatcherSMILESTest.test4Sulphur(CDKAtomTypeMatcherSMILESTest.java:169)
Note: This case is rightly received by the CDK atom type matcher as this configuration of sulphur is not possible. SO its an invalid error.
Whats NOT Done:
We have added 128 missing test cases and clubbed them as ~50 cases according to the element types. If you would like to break them into smaller pieces, please feel free to do so.
Hope this helps.
Many thanks,
Asad
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I know that fixing a patch of this size is a lot of work. That's why small patches, which you can submit soon, are so good. Easy to review, quick feedback, which you can use for later patches.
I will look at this patch today.
I think I have not been clear in what I wanted with CDKAtomTypeMatcherTest... in that class, I do not want any tests that depend on IO... I should add that to the class JavaDoc. The reason for that is that I want those unit tests to do their work with the least possible dependencies, to make sure that when they fail, it's because the atom typing fails, and not, for example, some ChemObjectReader got broken (which happened in the past).
Anyway, I'll look at it as soon as possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, CDKAtomTypeMatcherTest and CDKAtomTypeMatcherFilesTest are different. I am saying that tests with files are not good; they are. I just want them in a different class.
But, at the same time, I also prefer to have all atom types to have unit tests without IO.
We have fixed hybridization to best of our knowledge.
To our my knowledge Hybridization is defined for atoms which are covalently connected with other atoms, so in cases like ionic bonds, metallic atoms(just the atom as it is) such as Li,Na,K,W,Ni.metallic,Ni.2plus,Pu,Th,Al,Fe.metallic,In,Co.metallic,Cd.metallic,Hg.metallic, Pb.neutral, Zn etc will not have any hybridization.
While cases such as S.2minus without connected though belonging to P-block elements I am not sure whether they have any hybridization.
Asad & Nimish
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have fixed hybridization to best of our knowledge.
To our my knowledge Hybridization is defined for atoms which are
covalently connected with other atoms, so in cases like ionic bonds,
metallic atoms(just the atom as it is) such as
Li,Na,K,W,Ni.metallic,Ni.2plus,Pu,Th,Al,Fe.metallic,In,Co.metallic,Cd.metallic,Hg.metallic,
Pb.neutral, Zn etc will not have any hybridization.
While in cases such as S.2minus without connected atoms (though it belongs to P-block
elements), we are not sure if it has any hybridization.
Asad & Nimish
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This causes a regression in the Sybyl atom type matcher.
and comments that are no longer valid, now that many atom types are no longer renamed:
String[] expectedTypes = {"F", "I.minus.5", "F"};
String[] expectedTypes = {"F", "I.minus.5", "F"}; // I.minus.5 to I.minus.2
I do not know if metallic atoms, or ions without any valence electrons show hybridization either.
I spotted at least these atom types without hybridization which does have neighbors: N.minus, S.2minus
BTW, and this is a request, not q requirement,... if you can remove those "// Fix" comments, that would be appreciated.
Can you explain the hybridization of the S.sp3.4 atom type, which seems to have 12 electrons in four sp3 orbitals? Same for S.sp3d1 which has 12 electrons in five sp3d1 orbitals?
I am not sure I understand the F.minus.1 atom type... can you explain that one? E.g. give a PubChem or ChemSpider entry with a compound containing such an atom type?
I also like to hear why you changed the definition of this atom type:
OK, not done yet. But I am also wondering about the Ca.2plus.2 atom type... that doesn't look like a valid atom type to me... can you explain this one too?
That leaves me with 16% of the patch to go...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, I just have been rethinking the situation... the patch is huge and ugly... that troubles me:
I spend too much time on understanding what happens: whitespace is not the only problem, but also rearrangements
it has temporary comments like "// fix X" which are meaning less without any information on what fix X is
the whitespace changes ruin any change of merging atom type fixes easily from cdk-1.4.x to master (and worries me a lot, because we'll have those in parallel for at least 9 months)
I rather see the patch aimed at the cdk-1.4.x branch. That way we can make many people happy soon with the new atom types, rather than in 9 months or more
the patches not only add stuff, the change stuff too, which makes it rather difficult to oversee the consequences
this code is so deep inside the CDK, I am extra hard on getting this right
I really want to see small patches, without whitespace changes, without rearrangements, and without brackets that were not there before, and basically only add code.
Please split the three patches in one patch for one element where that element already has atom types. Put all elements which did not have atom types yet into one big patch. And remove all comments which refer to your development process (like '// Fix foo').
Please make such a patch for cupper or so, with the full test file. Make CDKAtomTypeMatcherTest not extends CDKAtomTypeMatcherEBITest. Instead, add a unit test for each atom type for the first without using IO (if you need help on automating creation of such unit tests for the test files you have, please let me know; I think I can do that in an hour or so, which is half the time I spent tonight on reviewing this huge patch).
This is what I like to have a patch look like: it only adds code. That way, it is crystal clear what changes, and then an atom type can be reviewed in a few minutes, including compiling and checking for remote regressions.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your patch seems to cause 16 regressions in the CDKAtomTypeMatcherTest after I applied it to cdk-1.4.x. Is it critical to apply to master instead of cdk-1.4.x?
Otherwise, I discussed the patch with Gilleain, in particular regarding splitting up the patch into smaller patches. Particularly, because the current patch also makes whitespace changes (don't make whitespace and {} changes in the same patch is functional changes), it is hard to see what is going on. For example, the patch feels like it changes atom type IDs? Why is that? That is going to break things...
Please also put one atom type in one unit test, rather than all in one test method, such as is now with e.g. the method fix_Fe().
This new test class should also be included in McoreTests.java.
And several atom types miss lone pair count and/or hybridization information.
Updated patch as required
Hi Egon, thanks for the review.
We have updated the patch.
Whats DONE:
a) Old IDs in the ontology files were reverted as suggested.
b) New ids and the updated ontology was added
c) New mol files were added for 128 test case for the missing atom types
d) Test cases was synchronised with CDKAtomTypeMatcherTest.java (CDKAtomTypeMatcherTest.java extends CDKAtomTypeMatcherEBITest.java)
e) Hence AtomTypeMatcher was also updated.
Here are the test results from the result-core.txt
Testsuite: org.openscience.cdk.modulesuites.McoreTests
Tests run: 503, Failures: 0, Errors: 1, Time elapsed: 6.815 sec
Testcase: test4Sulphur(org.openscience.cdk.atomtype.CDKAtomTypeMatcherSMILESTest): Caused an ERROR
null
java.lang.NullPointerException
at org.openscience.cdk.atomtype.CDKAtomTypeMatcherSMILESTest.test4Sulphur(CDKAtomTypeMatcherSMILESTest.java:169)
Note: This case is rightly received by the CDK atom type matcher as this configuration of sulphur is not possible. SO its an invalid error.
Whats NOT Done:
We have added 128 missing test cases and clubbed them as ~50 cases according to the element types. If you would like to break them into smaller pieces, please feel free to do so.
Hope this helps.
Many thanks,
Asad
Thanx Asad!
I know that fixing a patch of this size is a lot of work. That's why small patches, which you can submit soon, are so good. Easy to review, quick feedback, which you can use for later patches.
I will look at this patch today.
I think I have not been clear in what I wanted with CDKAtomTypeMatcherTest... in that class, I do not want any tests that depend on IO... I should add that to the class JavaDoc. The reason for that is that I want those unit tests to do their work with the least possible dependencies, to make sure that when they fail, it's because the atom typing fails, and not, for example, some ChemObjectReader got broken (which happened in the past).
Anyway, I'll look at it as soon as possible.
Well testOla28 in CDKAtomTypeMatcherFilesTest uses IO...
The other thing is many missing tests now exist, and all pass. "Tests run: 503, Failures: 0,"
Yes, CDKAtomTypeMatcherTest and CDKAtomTypeMatcherFilesTest are different. I am saying that tests with files are not good; they are. I just want them in a different class.
But, at the same time, I also prefer to have all atom types to have unit tests without IO.
(There is an easy way to convert MDL molfiles to CDK source code; you can do the same thing I do when creating unit tests based on PubChem entries... See http://chem-bla-ics.blogspot.com/2010/09/using-pubchem-to-create-cdk-unit-tests.html)
But please wait for me to review the current patch.
BTW, does you new patch you uploaded today also add the missing hybridization information?
We have fixed hybridization to best of our knowledge.
To our my knowledge Hybridization is defined for atoms which are covalently connected with other atoms, so in cases like ionic bonds, metallic atoms(just the atom as it is) such as Li,Na,K,W,Ni.metallic,Ni.2plus,Pu,Th,Al,Fe.metallic,In,Co.metallic,Cd.metallic,Hg.metallic, Pb.neutral, Zn etc will not have any hybridization.
While cases such as S.2minus without connected though belonging to P-block elements I am not sure whether they have any hybridization.
Asad & Nimish
We have fixed hybridization to best of our knowledge.
To our my knowledge Hybridization is defined for atoms which are
covalently connected with other atoms, so in cases like ionic bonds,
metallic atoms(just the atom as it is) such as
Li,Na,K,W,Ni.metallic,Ni.2plus,Pu,Th,Al,Fe.metallic,In,Co.metallic,Cd.metallic,Hg.metallic,
Pb.neutral, Zn etc will not have any hybridization.
While in cases such as S.2minus without connected atoms (though it belongs to P-block
elements), we are not sure if it has any hybridization.
Asad & Nimish
Please file a bug report for the test4Sulphur() indicating you believe that test is wrong.
The patch is too large for me to review tonight.. I already spent an hour on looking at it...
What I did spot so far is:
(there still is one atom type renamed)
This causes a regression in the Sybyl atom type matcher.
and comments that are no longer valid, now that many atom types are no longer renamed:
I do not know if metallic atoms, or ions without any valence electrons show hybridization either.
I spotted at least these atom types without hybridization which does have neighbors: N.minus, S.2minus
BTW, and this is a request, not q requirement,... if you can remove those "// Fix" comments, that would be appreciated.
Can you explain the hybridization of the S.sp3.4 atom type, which seems to have 12 electrons in four sp3 orbitals? Same for S.sp3d1 which has 12 electrons in five sp3d1 orbitals?
I am not sure I understand the F.minus.1 atom type... can you explain that one? E.g. give a PubChem or ChemSpider entry with a compound containing such an atom type?
I also like to hear why you changed the definition of this atom type:
<at:atomtype rdf:id="Cl.chlorate">
<at:formalcharge>0</at:formalcharge>
<at:haselement rdf:resource="&elem;Cl">
<at:formalneighbourcount>3</at:formalneighbourcount>
- <at:lonepaircount>0</at:lonepaircount>
+ <at:lonepaircount>1</at:lonepaircount>
<at:pibondcount>2</at:pibondcount>
- <at:hybridization rdf:resource="&at;sp2">
+ <at:hybridization rdf:resource="&at;sp3">
</at:hybridization></at:hybridization></at:haselement></at:atomtype>
OK, not done yet. But I am also wondering about the Ca.2plus.2 atom type... that doesn't look like a valid atom type to me... can you explain this one too?
That leaves me with 16% of the patch to go...
OK, I just have been rethinking the situation... the patch is huge and ugly... that troubles me:
I really want to see small patches, without whitespace changes, without rearrangements, and without brackets that were not there before, and basically only add code.
Please split the three patches in one patch for one element where that element already has atom types. Put all elements which did not have atom types yet into one big patch. And remove all comments which refer to your development process (like '// Fix foo').
Please make such a patch for cupper or so, with the full test file. Make CDKAtomTypeMatcherTest not extends CDKAtomTypeMatcherEBITest. Instead, add a unit test for each atom type for the first without using IO (if you need help on automating creation of such unit tests for the test files you have, please let me know; I think I can do that in an hour or so, which is half the time I spent tonight on reviewing this huge patch).
As a reference patch, look at this one:
https://github.com/cdk/cdk/commit/90371fd74808b28d8a819e4c8bf9fd3788a15678
This is what I like to have a patch look like: it only adds code. That way, it is crystal clear what changes, and then an atom type can be reviewed in a few minutes, including compiling and checking for remote regressions.
We have resubmitted the patches.
KEGG Atom type patches - ID: 3374133
Hope this is works out for the reviewers.
Asad