From: Chris M. <c.m...@ga...> - 2008-10-23 14:05:56
|
Dr P. Murray-Rust wrote: > David García Aristegui wrote: >> I think is not the same: the SMILES format could use the wildcard (*): >> >> "* is wildcard (any atom). The wildcard atom may also be written without >> brackets" >> http://www.daylight.com/dayhtml_tutorials/languages/smiles/index.html >> >> The SMARTS pattern for an aldehyde is: >> [#6][CX3](=O) 3 aldehyde or ketone >> >> The SMILES representation for the funtional group is >> ([H]C([*])=O) >> >> Thank you again. Best regards. > > I agree with David's interpretation. Like a number of things in the SMILES > spec it is not sufficiently. There are cases where it is reasonable to say > "I know this compound has an atom here but I don't know what it is". That > represents uncertainty, which is not necessarily the same as a query. What > the intention of Weininger was when he wrote this we may never know. > > In JUMBO I translate this to an atom of elementType "R" in CML. Whether > that is a single atom or potentially a group is left undefined. The SMILES atom * is handled ok by OpenBabel. In OBMol it is an atom with zero atomic number. It is output as * in MDL mol , as *(without the brackets)in SMILES and as R in CML. What you seem to be looking for is a SMARTS to SMILES converter, which, as far as I know OpenBabel doesn't have, and I wonder whether such a thing is possible. While SMILES ([H]C([*])=O) will match (as SMARTS) all aldehydes, it will also match formic acid and its esters. You could probably devise a SMARTS pattern to exclude these, but SMILES isn't subtle enough. So why do you want to store functional groups in you database as SMILES rather than SMARTS? By the way, don't use patterns.txt because it is too incomplete. Use SMARTS_InteLigand.txt instead. Chris |