[Rdkit-devel] [Cartridge] Cis/Trans in SMILES bug (backslashes)
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: Adrian S. <am...@ca...> - 2011-04-01 13:15:39
|
Hi Greg,
I have noticed that trans backslashes in SMILES strings will cause
problems in PostgreSQL because they are not treated as literals by
default. The E'' syntax will not escape the backslash and just remove
it instead. select 'F\C=C/F' = 'F\C=C/F' is by default the same as
select 'FC=C/F' = 'F\C=C/F.
Given the molecule:
CC[NH+]1CC[NH+](CC1)Cc2cc(cc(c2)NC(=O)c3cccc4c3ccc(c4)c5ccc(o5)/C=C\6/C(=O)NC(=O)S6)C(F)(F)F
>>> select mol_in('CC[NH+]1CC[NH+](CC1)Cc2cc(cc(c2)NC(=O)c3cccc4c3ccc(c4)c5ccc(o5)/C=C\6/C(=O)NC(=O)S6)C(F)(F)F'::cstring)
produces "WARNING: nonstandard use of escape in a string literal" and
SMILES error afterwards
>>> select mol_in(E'CC[NH+]1CC[NH+](CC1)Cc2cc(cc(c2)NC(=O)c3cccc4c3ccc(c4)c5ccc(o5)/C=C\6/C(=O)NC(=O)S6)C(F)(F)F'::cstring)
will be escaped to a binary character, produces SMILES error
---
The solution is to set standard_conforming_strings = on (default in 9.1):
>>> set standard_conforming_strings = on;
>>> select mol_in('CC[NH+]1CC[NH+](CC1)Cc2cc(cc(c2)NC(=O)c3cccc4c3ccc(c4)c5ccc(o5)/C=C\6/C(=O)NC(=O)S6)C(F)(F)F'::cstring);
produces: CC[NH+]1CC[NH+](Cc2cc(NC(=O)c3cccc4c3ccc(-c3ccc(/C=C5\SC(=O)NC5=O)o3)c4)cc(C(F)(F)F)c2)CC1
if you take this SMILES string and do mol_in without
standard_conforming_strings you will get:
CC[NH+]1CC[NH+](Cc2cc(NC(=O)c3cccc4c3ccc(-c3ccc(C=C5SC(=O)NC5=O)o3)c4)cc(C(F)(F)F)c2)CC1,
which is different.
Cheers,
Adrian
|