Assigning atom types, aromatic flags, hybridization and hydrogens

Atom types can be assigned to atoms of a molecule using only topological informations and SMARTS substructure search. For more specialized atom types, like special chirality- and Z/E-isomerism-descriptors it would be a good choice to use atom property descriptors (see the Section called Atom properties in Chapter 5).

As already discussed in our three feature selection model building papers [wfz04a,wfz04b,fwz04] the descriptor calculation is the last step after calling four different expert systems, so you should be carefully check your descriptor results when predicting values with models not calculated on your own.

In our opinion for every expert system a 'standard' (e.g., JOELib/OpenBabel, atomTyperVersion=1.0), formulated as classification problem should exist, to be able to say simply: calculate descriptors for the already mentioned standard. The formulation as classification in a PUBLIC database is required to test your/our implemented atom typer against this standard. Let's see if we can ever find time and men-/women-power to formulate and test such a standard ...

Table 3-4. Process of assigning atom types

Molecule

 aromaticityhybridizationimplicite valenceatom typesdescriptor
 SMARTS without   calculation
 D<n>   algorithm
 ^<n>    

Assigning aromaticity flags

Aromatic flags can be assigned to atoms using SMARTS (see the Section called SMARTS definition) substructure patterns defined in the joelib/data/plain/aromatic.txt-file. All SMARTS patterns except D<n> (explicite bonds) and ^<n> (hybridization) are allowed. Chiral atoms are allowed, which use the XYZVector.calcTorsionAngle(...)-method.

Assigning atom hybridizations

To assign atom hybridizations it is necessary to have already assigned aromaticity flags. All INTHYB-definitions in the joelib/data/plain/atomtype.txt-file are used get the atom hybridizations.

Assigning atom types

To assign atom types it is necessary to have already assigned aromaticity flags and atom hybridizations. All EXTTYP-definitions in the joelib/data/plain/atomtype.txt-file are used get the atom types. These are mainly used for the file conversion process and for descriptor calculation algorithms.

Assigning implicite hydrogens

To assign the implicite valence to atoms it is necessary to have already assigned aromaticity flags, atom hybridizations and atom types. All IMPVAL-definitions in the joelib/data/plain/atomtype.txt-file are used to calculate the number of implicite hydrogens for each atom.

Calculate descriptors and/or assign special atomtypes

Descriptors can be simple topology descriptors without requiring any chemical informations or descriptors with requiring atom types and implicite hydrogens (see e.g. the Section called Fingerprints in Chapter 5). PATTY rules (see the Section called Programmable Atom Typer (PATTY)) can be used for simple atom type descriptors. And all kinds of other models or expert rules can be used for chirality or Z/E-isomerism descriptors.

Table 3-5. Possible special atom type assignments (not implemented)

AssignmentReference
chirality descriptor[gt03]
E/Z descriptor[gbt02]
planar three-coordinate nitrogencalculate vector product of the three neighbors
nitrogen with aromatic ligandPATTY SMARTS rule: [#7a]

For all atom property descriptors there must always exist a descriptor documentation-file (see the Section called Writing your own descriptor and result classes). Otherwise a HTML-documentation (generated by using DocBook) error will occur every time JOELib starts. The XML- and RTF-description files are optional.