|
From: Michael G. <mge...@ip...> - 2013-02-18 09:53:55
|
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Hi John and Egon,<br>
<br>
thanks for clarifying this.<br>
<br>
I just have one additional question regarding SMARTS. The Daylight
depictmatch query site provides the setting of "explicit hydrogen
matching".<br>
Enabling it allows this SMILES <b>OC(=O)C(=O)On2nnc1ccc(cc12)C(F)(F)F</b>
to be matched with that SMARTS <b>O=CO[H]</b>. Thiis is meant to
check for carboxylic acid and not for esters. <br>
Does the CDK also provide a setting for explicit hydrogen
matching?<br>
If not, would it be an alternative to use the MCSS-matcher and
check if the size of the MCSS matches the number of atoms in the
SMARTS, e.g. 4 in this example and use this as simplified SMARTS
matcher?<br>
<br>
Best wishes,<br>
Michael<br>
<br>
<br>
On 02/15/2013 01:46 PM, Egon Willighagen wrote:<br>
</div>
<blockquote
cite="mid:CAM...@ma..."
type="cite">
<pre wrap="">Hi all,
Just as a quick note...
On Fri, Feb 15, 2013 at 10:11 AM, cruttkie <a class="moz-txt-link-rfc2396E" href="mailto:cru...@ip..."><cru...@ip...></a> wrote:
</pre>
<blockquote type="cite">
<pre wrap="">The following SMILES was generated from ACD ChemSketch: c1cccc2nnnc12
Clearly the unspecified position of the hydrogen for one of the nitrogens is
not proper, but that's the output from ACD.
</pre>
</blockquote>
<pre wrap="">
I do not think the position of that hydrogen is a problem, but more
that one does not have the explicit hydrogen defined...
It seems that most toolkits are fairly OK with parsing it:
<a class="moz-txt-link-freetext" href="http://apps.ideaconsult.net:8080/ambit2/depict?search=c1cccc2nnnc12&smarts=">http://apps.ideaconsult.net:8080/ambit2/depict?search=c1cccc2nnnc12&smarts=</a>
But note that Daylight's Depict marks the string as an invalid SMILES,
but a query instead.
Now the problem is that the input has two bits of information for all
three nitrogens:
- it is trivalent (lowest valency is the default for the organic
subset, with 3,5 as possible N valencies)
- it is aromatic/sp2 (lower case organic subset)
The latter is very unclearly defined, and some cleanup has been
attempted in the OpenSMILES specification.
Now, the original specification talks about sp2, and the only way to
have a sp2, trivalent nitrogen is with a double bond, but that is not
possible for all three nitrogens in the above structure, and hence the
SMILES may come from ACD ChemSketch, but is faulty.
Then, in practice many tools knows about this common mistake in SMILES
strings, and deal with it nevertheless, and make a good guess at it,
as visible from the AMBIT service.
Now, the CDK does not do gambling on what was meant, and ends up with
an internal data structure. In CDK 1.4 there is no handling of unknown
bond order, but that has been dealt with in 1.5/1.6.
Regarding the aromaticity, SMILES becomes even trickier, and the
original SMILES requires the toolkit to perceive the aromaticity
itself. And that is hard with ambigue input...
</pre>
<blockquote type="cite">
<pre wrap="">1) Why is the use of aromaticity not always enabled by default as it plays a key role for proper structure representation, mainly in case of SmilesGeneration and MOL file creation?
</pre>
</blockquote>
<pre wrap="">
Because aromaticity is a really difficult concept, there is no
definition of it, and everyone disagrees on that definition.
</pre>
<blockquote type="cite">
<pre wrap="">2) The SMILES c1cccc2nnnc12 appears to be valid when parsed with the SmilesParser. What happens so that the nitrogens are suddenly treated as aliphatic although being aromatic?
</pre>
</blockquote>
<pre wrap="">
Yes, there is a known unit test fail about that:
<a class="moz-txt-link-freetext" href="http://pele.farmbio.uu.se/nightly-1.4.x/test/result-smiles.html">http://pele.farmbio.uu.se/nightly-1.4.x/test/result-smiles.html</a>
See testPyrrole3()
</pre>
<blockquote type="cite">
<pre wrap="">3) How different are the SMILES implentations from CDK (which appears to rely on Daylight SMILES implementation) and ACD or ChemSpider? If you have any knowledge about that?
</pre>
</blockquote>
<pre wrap="">
No clue. They do not use Open Source.
</pre>
<blockquote type="cite">
<pre wrap="">4) Do you know a SMARTS pattern that might fix the problem with the explicit hydrogen position for the three nitrogens so the "invalid" SMILES could be replaced by a proper SMARTS?
</pre>
</blockquote>
<pre wrap="">
That would be an interesting question for the Blue Obelisk eXchange :)
Grtz,
Egon
--
Dr E.L. Willighagen
Postdoctoral Researcher
Department of Bioinformatics - BiGCaT
Maastricht University (<a class="moz-txt-link-freetext" href="http://www.bigcat.unimaas.nl/">http://www.bigcat.unimaas.nl/</a>)
Homepage: <a class="moz-txt-link-freetext" href="http://egonw.github.com/">http://egonw.github.com/</a>
LinkedIn: <a class="moz-txt-link-freetext" href="http://se.linkedin.com/in/egonw">http://se.linkedin.com/in/egonw</a>
Blog: <a class="moz-txt-link-freetext" href="http://chem-bla-ics.blogspot.com/">http://chem-bla-ics.blogspot.com/</a>
PubList: <a class="moz-txt-link-freetext" href="http://www.citeulike.org/user/egonw/tag/papers">http://www.citeulike.org/user/egonw/tag/papers</a>
------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013
and get the hardware for free! Learn more.
<a class="moz-txt-link-freetext" href="http://p.sf.net/sfu/sophos-d2d-feb">http://p.sf.net/sfu/sophos-d2d-feb</a>
_______________________________________________
Cdk-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Cdk...@li...">Cdk...@li...</a>
<a class="moz-txt-link-freetext" href="https://lists.sourceforge.net/lists/listinfo/cdk-devel">https://lists.sourceforge.net/lists/listinfo/cdk-devel</a>
</pre>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="72">--
Michael Gerlich
Group Bioinformatics & Mass Spectrometry
Leibniz Institute of Plant Biochemistry
Weinberg 3
06120 Halle, Germany
email: <a class="moz-txt-link-abbreviated" href="mailto:mic...@ip...">mic...@ip...</a>
phone: +49-345-5582-1475
fax: +49-345-5582-1409
</pre>
</body>
</html>
|