From: Andrew Dalke <dalke@da...> - 2012-04-10 21:05:59
Hi Egon (and BO-SMILES),
On Apr 10, 2012, at 9:18 PM, Egon Willighagen wrote:
> one of my pet SMILES strings is oc1ccc(o)cc1... I discovered today
> that different tools think this is a different chemical
That's not a valid SMILES. Rather, in the common subset of
SMILES which everyone understands, an 'o' must be in a ring
in order to be aromatic. You have two oxygens which aren't
in a ring but are denoted as aromatic. There's nothing which
describes how to handle that case.
For example, RDKit says:
>>> from rdkit import Chem
[22:31:15] non-ring atom 3 marked aromatic
(It looks like it's counting atoms from 0, not 1.)
> Daylight finds a different chemical (and no errors in the string) than
> Cactvs and Open Babel.
Daylight does fun and wonderful things when aromatics aren't
in a ring. For instance, Daylight accepts 'cccc' as being valid.
> Now, I think OpenBabel implements OpenSMILES (not sure about Cactvs),
What's in OpenSMILES should be common to anyone implementing
the standard, core SMILES.
I know that Open Babel and CACTVS both implements things
beyond what's in OpenSMILES. OB supports a radical extension,
CACTVS supports additional aromaticity types.
The problem is the aromaticity perception. While I think
standardizing that is useful, I've consistently argued that
(unlike the Daylight model) perception should be independent
of the OpenSMILES spec itself, because there are different
aromaticity models and they can be represented in SMILES.