From: John M. <jo...@eb...> - 2013-10-14 19:04:53
|
Okay so some definitions, isomer refers to isotopes and stereoisomers. generic SMILES, no isomer information - multiple valid SMILES, also called arbitrary SMILES. unique SMILES, no isomer information - canonical, unique SMILES. Daylight (and CDK) needs delocalised representation, can be avoided though. isomeric SMILES, isomer information - multiple valid SMILES absolute SMILES, isomer information - canonical canonical SMILES - means unique or absolute From the Daylight theory manual - A canonicalization algorithm exists to generate one special generic SMILES among all valid possibilities; this special one is known as the "unique SMILES". SMILES written with isotopic and chiral specifications are collectively known as "isomeric SMILES". A unique isomeric SMILES is known as an "absolute SMILES". See the following examples. The CDK does have an okay canonicalisation algorithm but it doesn't consider stereochemistry and so we can generate unique but not absolute SMILES. Absolute SMILES is more difficult to canonicalise and generate. PubChem provides 'PUBCHEM_OPENEYE_CAN_SMILES' (unique SMILES) and 'PUBCHEM_OPENEYE_ISO_SMILES' (isomeric SMILES) from OpenEye. Note the second isn't canonical. Another interesting point it that the unique SMILES for PubChem aren't aromatic. The aromatic part in SMILES is mainly to avoid different kekule representations (see problems if you don't use aromatic smiles here - http://link.springer.com/chapter/10.1007%2F11530084_13). I think what PubChem probably does is canonicalise the molecule and then re-assign a kekule structure. Assigning a kekule structure when the atoms are sorted in canonical order will of course give you the correct representation. Generally SMILES with set bonds orders is much nice than aromatic - but that means you have to use generic SMILES in our case. Okay so what to use of course depends on the goals. I would probably say Unique SMILES should be done only on request but then again I think a lot of confusion arises for the fact people want unique SMILES when actually it's great general format just for storing compounds. I would probably display two fields but would label them clearly as Generic/Unique etc. Generic SMILES and Isomeric SMILES, or Unique SMILES and Isomeric SMILES J On 14 Oct 2013, at 15:54, Egon Willighagen <ego...@gm...> wrote: > John, > > your new SMILES generator supports the creation of various SMILES > variants, like PubChem also has on their data files... Right now, > Bioclipse shows only one SMILES... which SMILES variants should it be > showing in the Properties "View" (middle bottom in the screenshot)? > > Egon > > > -- > Dr E.L. Willighagen > Postdoctoral Researcher > Department of Bioinformatics - BiGCaT > Maastricht University (http://www.bigcat.unimaas.nl/) > Homepage: http://egonw.github.com/ > LinkedIn: http://se.linkedin.com/in/egonw > Blog: http://chem-bla-ics.blogspot.com/ > PubList: http://www.citeulike.org/user/egonw/tag/papers > ORCID: 0000-0001-7542-0286 > <bcSMILES.png> |