Re: [Rdkit-discuss] sanitization removes Hs - is this expected?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Michal K. <mic...@gm...> - 2015-04-07 10:13:58
|
Thanks a lot! By the way, it would be useful to have this feature (MergeQueryHs) also in the substructure search KNIME node. Best wishes, Michal On 3 April 2015 at 06:29, Greg Landrum <gre...@gm...> wrote: > The changes are now pushed ( > https://github.com/rdkit/rdkit/commit/f0d4cf1ec63a4928a2a28fa62cf1d255099e72d0) > and are available on master. > The new functions are qmol_from_smiles() and qmol_from_ctab() > > Best, > -greg > > > On Thu, Apr 2, 2015 at 10:51 AM, Greg Landrum <gre...@gm...> > wrote: > >> Hi Michal, >> >> Glad to hear this matches what you are looking for. I have already added >> the feature to the cartridge and will check it in later today/tomorrow >> morning. >> >> -greg >> >> On Thursday, April 2, 2015, Michal Krompiec <mic...@gm...> >> wrote: >> >>> Hi Greg, >>> Thank you, this is exactly what I needed. >>> >>> On 2 April 2015 at 05:22, Greg Landrum <gre...@gm...> wrote: >>> >>>> >>>> Skipping sanitization, as you propose, isn't going to help here: the >>>> kekulized form of the ring will not be converted to aromatic and you won't >>>> get the matches you are looking for. >>>> >>> Indeed. Previously I stored my dataset as smiles with explicit >>> hydrogens, and created the query mols by adding Hs and then deleting >>> hydrogens at substitution sites and finally converting to SMARTS - a messy >>> workaround, but producing the right result. >>> >>> >>>> Here's an approach to this that works in Python : >>>> >>> >>> >>> And this is exactly what I wanted. To illustrate it more precisely: your >>> pattern (2-H-pyrimidine) matches pyrimidine, 5-methylpyrimidine but does >>> not match 2-pyrimidine: >>> >>> >>> m =Chem.MolFromSmiles('c1ccnc([H])n1',sanitize=False); >>> >>> nm=Chem.MergeQueryHs(m) >>> >>> Chem.SanitizeMol(nm) >>> rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >>> >>> Chem.MolFromSmiles('c1ccncn1').HasSubstructMatch(nm) >>> True >>> >>> Chem.MolFromSmiles('c1c(C)cncn1').HasSubstructMatch(nm) >>> True >>> >>> Chem.MolFromSmiles('c1ccnc(C)n1').HasSubstructMatch(nm) >>> False >>> >>> >>> >>> >>> >>>> >>>> Being able to do something equivalent in the cartridge would >>>> certainly be useful. What I'd suggest is the addition of two functions: >>>> "query_mol_from_smiles()" and "query_mol_from_ctab()" that do this. >>>> >>> >>> I'll do it. >>> >>> >>> Then you could do queries like: >>>> select * from mols where m @> query_mol_from_smiles('c1ccnc([H])n1'); >>>> and have it do the right thing. >>>> >>>> Sound reasonable? >>>> >>>> -greg >>>> >>>> >>> Best wishes, >>> Michal >>> >>> > |