Currently in the CDK tetrahedral centres are represented by a central atoms and four neighbours. Representing chiral centres with only three neighbours has previously been achieved by adding an explicit hydrogen atom. It is not currently possible to represent centres involving a lone-pair (e.g. Sulfoxide
). Also it would be nice if we didn't have to add an explicit atom - one most consider all possible side effects: existing coordinates, implicit hydrogen counts, atom-typing etc.
To add the ability to store and represent these we have a couple of options. All have pros and cons and would be good to have your opinions.
1. Use the central atom to represent an implicit atom in the list of neighbours. Pro. No existing code changes - this already works SMILES/InChI conversion will be very easy and the parity computation is elegant. Con. May be counter intuitive - we will document this but of course it may not be read.
2. Change the tetrahedral data structure to use ElectronContainers. Pro. represents a more realistic model of the stereo-centre Con. still have to add parts (e.g. explicit H) to the structure - it will likely be fiddly when converting to other formats.
3. 2. with use of an fixed implicit reference. From what I can tell this is basically how OpenBabel does it. Pro. good representation and don't have to add parts to the molecule Con. fiddly when read/writing to other formats.
4. Allow three neighbours in the tetrahedral centre. Pro. don't have to add parts to the molecule - conversion is easy Con. need to make sure the structure is always normalised such that one is looking from the implicit part
We are swaying towards the first as it fits well with the existing InChI and new SMILES code. One issue with 1 and 4 is there may be a corner case which requires both the implicit hydrogen and a lone pair or perhaps an unpaired electron. I've spoken with the ChEBI curators and the examples I could come up with ('[S@H](C)=O
' and '[C@H-](C)N') would not require high enough energy to be optically stable.
Any corner cases or other suggestions are most welcome.