From: Craig A. J. <cj...@em...> - 2007-11-26 15:12:01
|
Andrew Dalke wrote: > On Nov 25, 2007, at 11:18 PM, Craig James wrote: >> Andrew Dalke wrote: >>> Because aromaticity in SMILES is only really designed for canonical >>> SMILES, and canonical SMILES are only valid in the context of a given >>> canonicalization algorithm, hence it doesn't matter. >> This isn't the main reason for aromaticity, and not the most >> important! > > Quoting Dave Weininger's contribution to the Wiley "Chemoinformatics > Handbook": > > """ > - Why does SMILES provide an "aromatic" concept at all? > > The SMILES language was specifically designed to be > "canonicalizable", i.e., not only to provide an unambiguous chemical > nomenclature but also be able to express a single unique SMILES for > every structure in the same language. This implies a fundamental > requirement to express the symmetry of a molecule correctly. This confirms, rather than refutes, my assertion. Dave seems to be emphasizing canonicalization in answering this question, but having worked with him for six years, I heard him explain aromaticity many times, and the purpose was not for canonical SMILES. It was, as this quote says, to reflect the underlying uniformity of the bonds in aromatic systems (Dave calles it "symmetry" in this quote, but that's a bit inaccurate). Canonical SMILES are just one reflection of that underlying uniformity. Without aromaticity, SMARTS are useless, and canonicalization is impossible. Several contributors have suggested that aromaticity isn't necessary except for canonicalization. I was just trying to point out that this isn't true. I'll modify my assertion: Aromaticity is at the core of a SMILES-based cheminformatics system, and is needed for canonicalization and for SMARTS matching; both are equally important. On top of that, I think there is an important esthetics issue here. SMILES strings are nice; chemists seem to find them understandable and user friendly, and the SMILES notion of aromaticity is one thing that contributes to that user friendliness. I personally think "c1ccccc1" is much nicer than "C1=CC=CC=C1". This may seem like a trivial issue, but I think it's one of the reasons SMILES survives. Craig |