From: Egon W. <ego...@gm...> - 2007-06-27 14:48:50
|
Rajarshi and all others, now that we are talking about atom types again... maybe we should have a good look at the IAtomType interface... The following are I think OK: public void setAtomTypeName(String identifier); public String getAtomTypeName(); public void setFormalCharge(Integer charge); public Integer getFormalCharge(); public void setHybridization(int hybridization); public int getHybridization(); while these are not bad either, but less used (extra interface?): public void setVanderwaalsRadius(double radius); public double getVanderwaalsRadius(); public void setCovalentRadius(double radius); public double getCovalentRadius(); But the use of the following is giving trouble. Traditionally we had: public void setMaxBondOrder(Double maxBondOrder); public void setBondOrderSum(Double bondOrderSum); public Double getMaxBondOrder(); public Double getBondOrderSum(); which makes sense for very general atom types, while for more detailed atom type definitions these were added later: public void setFormalNeighbourCount(int count); public int getFormalNeighbourCount(); public void setValency(int valency); public int getValency(); Now, I understand that maxBondOrder and bondOrderSum can be useful, but should we really put that in the IAtomType interface and not in the Class that uses that information? It is these fields that make the calculation of implicit hydrogen difficult. The idea behind it, I guess, is that using these two fields you can set some general rules, but note that minBondOrder is missing (so, it cannot distinguish between =O and -O for a =O atom type definition). Atom Type info sources ---------------------- There basically are two types of questions regarding atom types, given a random molecule representation: 'what atom type is this atom?' and 'what atom type could this atom be?'. Both do perception, but the first gives an exlusive answer, while the second gives a list of options. This distinction is important because the input CDK gets is often unclear: no explicit hydrogens, no explicit bond orders and no explicit atom type (with some exceptions). Atom Type perception -------------------- So, when the file format is not clear on that, and the input is even worse, it is up to the CDK to assume things. In some cases all information needed to perceive atom types is present and then a definate atom type can be assigned. In other cases, a list of atom types is likely. Hence the two interfaces in cdk.atomtype IAtomTypeMatcher and IAtomTypeGuesser. Guess which one returns a List<IAtomType> instead of <IAtomType> :) Now, neither of these are well tested. This is what I plan to change (for about half a year already :( ) What I want is a toolkit that tests if the matching and guesser algorithms can find all the atom types it has in the lists, but setting up appropriate tests. In case of IAtomTypeMatcher, the test should allow a definate answer. For IAtomTypeGuesser, however, it needs to consider the various kinds of missing information, or combinations thereof. Complex stuff, something we have not addresses properly. I have no idea how OpenBabel does this, or how good it does this; I have not yet have had time to do that. Help most welcome. This is very important to get right, even more than the UNSET issues giving headaches now; quite related actually. Only solving this puzzle allows us to move forward. I should have spoken up about it much earlier. Apologies for not having done that. Egon -- ego...@gm... Blog: http://chem-bla-ics.blogspot.com/ GPG: 1024D/D6336BA6 |