From: Chris M. <c.m...@ga...> - 2006-03-16 17:18:57
|
Craig A. James wrote: > I've discovered a bug in the SMILES parser, and not being a chemist > myself, I'm not sure what the parser is "thinking," so I'm reluctant to > dig into it. The concept of "spin multiplicity" seems to be related to > radicals, but the chemists I showed the code to were also baffled by > what it's supposed to be doing. Here's the bug: > > $ echo "[Na+].[Br-]" | babel -i smi - -o smi - > [Na+].[BrH-] > > However, do the same thing with [OH-] and it gets it right: > > $ echo "[Na+].[OH-]" | babel -i smi - -o smi - > [Na+].[OH-] > > The rule with SMILES atoms in brackets is: The parser doesn't get to > change it, period. If I type in [CH9], that may be nonsensical, but the > parser has to accept it. The only valence assumptions the parser can > make are with the "organic subset" of B, C, N, O, P, S, Br, Cl, F, and > I, and then only if they're not in brackets. > > A recently-changed line of code seems to have introduced the problem: > > smilesformat.cpp: line 2256 > > //add extra hydrogens > // if (!normalValence && atom->ImplicitHydrogenCount()) > if (atom->ImplicitHydrogenCount() && !atom->IsHydrogen()) //CM 21Mar05 > { > strcat(element,"H"); > .... > > Can anyone help out with this? For now, I've reverted to the > commented-out line, but I fear there's more to it. A quick fix for this bug, without changing code, is to add the line IMPVAL [#35D0-] 0 #[Br-] to atomtyp.txt. Where this file is stored depends on your system. The attached file is updated with this change and similar ones for the other halogens. [OH-] works ok because there is the appropriate line is already there. The SMILES output code was altered so that the explicit hydrogen form [CH3][CH2][CH3] could be output, as an alternative to the normal CCC. (Currently it happens automatically if the -h option has been used. (Do people find this acceptable?)) The explicit form is also used for atoms with radical centres: ethyl radical is output as C[CH2]. Currently, whenever the bracket form is output, implicit hydrogens are added. The rules for the "implicit valence" are in atomtyp.txt. I'll change it in cvs so that Hs are added in the bracket form only when explicit hydrogens have been requested, or for radicals. This is a second fix for the [Br-] bug. At present OpenBabel assigns spin multiplicity on input when it sees hydrogen deficient atoms and there some explicit hydrogens (other than D or T) on the same atom. Your example of [CH9] is parsed on input as you would like and is output as [CH9] when the explicit hydrogen form is requested, otherwise it is output as C But there is no outright prohibition on changing any atom that has been input from a SMILES atom in a bracket, so I guess there will be cases where standard SMILES isn't interpreted as intended. Finding some more examples would help to minimise this inconsistency. Standard SMILES (and the original Babel) lack of support for radicals is regrettable, but extending it without introducing anomalies is a bit of a tightrope walk. This week I was modifying this part of the code and your comments have helped considerably keeping me on the straight and narrow (to continue the metaphor). Chris |