Re: [Rdkit-discuss] sanitization removes Hs - is this expected?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Michal K. <mic...@gm...> - 2015-04-01 08:51:36
|
Hi Greg, Is it possible to do the same (i.e. create a molecule from SMILES without removing explicit hydrogens) in the postgresql cartridge? I would like to do a "restricted" substructure search using SMILES queries. For example, with the standard behaviour (hydrogens removed), c1ccccc1[CH3] is converted to c1ccccc1C and matches TNT and benzaldehyde, whereas if the hydrogens are not removed, this SMILES query would match TNT but not benzaldehyde. Of course, this can be done with SMARTS but SMILES with explicit hydrogens can be drawn in MarvinSketch in KNIME by a non-expert user. It seems that this is not possible currently in the cartridge and would require: - exposing sanitize parameter in parseMolText in adapter.cpp - adding a modified mol_to_smiles to rdkit_io.c and rdkit.sql(91).in Am I right or is there a simpler way of doing it? Best wishes, Michal On 25 February 2014 at 04:23, Greg Landrum <gre...@gm...> wrote: > Hi Michal, > > On Mon, Feb 24, 2014 at 4:48 PM, Michal Krompiec < > mic...@gm...> wrote: > >> Hello, I have just noticed this: >> >>> Chem.MolToSmiles(Chem.MolFromSmiles("[H]c1c([H])sc([H])c1[H]")) >> 'c1ccsc1' >> >>> >> Chem.MolToSmiles(Chem.MolFromSmiles("[H]c1c([H])sc([H])c1[H]",sanitize=False)) >> '[H]c1sc([H])c([H])c1[H]' >> >>> >> Chem.MolToSmiles(Chem.RemoveHs(Chem.MolFromSmiles("[H]c1c([H])sc([H])c1[H]",sanitize=False))) >> 'c1ccsc1' >> >>> Chem.MolToSmiles(Chem.MolFromSmiles("[H]c1cscc1[H]")) >> 'c1ccsc1' >> >>> Chem.MolToSmiles(Chem.MolFromSmiles("[H]c1cscc1[H]",sanitize=False)) >> '[H]c1cscc1[H]' >> >> Is it the expected behaviour? Why does sanitization remove hydrogens? > > Is it controlled by any of the SanitizeFlags? >> > > It is the expected behavior. When sanitization is turned on, the SMILES > parser actually calls "RemoveHs"; this removes the hydrogens from the graph > and then sanitizes the molecule. > > If you do not want the Hs removed, you can tell MolFromSmiles to skip the > sanitization (which also skips the RemoveHs) and then sanitize yourself:: > > In [3]: m=Chem.MolFromSmiles("[H]c1c([H])sc([H])c1[H]",sanitize=False) > > In [4]: Chem.SanitizeMol(m) > Out[4]: rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE > > In [5]: print Chem.MolToSmiles(m) > [H]c1sc([H])c([H])c1[H] > > I hope this helps, > -greg > > |