Re: [Rdkit-discuss] sanitization removes Hs - is this expected?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Michal K. <mic...@gm...> - 2015-04-02 08:40:32
|
Hi Greg, Thank you, this is exactly what I needed. On 2 April 2015 at 05:22, Greg Landrum <gre...@gm...> wrote: > > Skipping sanitization, as you propose, isn't going to help here: the > kekulized form of the ring will not be converted to aromatic and you won't > get the matches you are looking for. > Indeed. Previously I stored my dataset as smiles with explicit hydrogens, and created the query mols by adding Hs and then deleting hydrogens at substitution sites and finally converting to SMARTS - a messy workaround, but producing the right result. > Here's an approach to this that works in Python : > And this is exactly what I wanted. To illustrate it more precisely: your pattern (2-H-pyrimidine) matches pyrimidine, 5-methylpyrimidine but does not match 2-pyrimidine: >>> m =Chem.MolFromSmiles('c1ccnc([H])n1',sanitize=False); >>> nm=Chem.MergeQueryHs(m) >>> Chem.SanitizeMol(nm) rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >>> Chem.MolFromSmiles('c1ccncn1').HasSubstructMatch(nm) True >>> Chem.MolFromSmiles('c1c(C)cncn1').HasSubstructMatch(nm) True >>> Chem.MolFromSmiles('c1ccnc(C)n1').HasSubstructMatch(nm) False >>> > > Being able to do something equivalent in the cartridge would certainly be > useful. What I'd suggest is the addition of two functions: > "query_mol_from_smiles()" and "query_mol_from_ctab()" that do this. > I'll do it. Then you could do queries like: > select * from mols where m @> query_mol_from_smiles('c1ccnc([H])n1'); > and have it do the right thing. > > Sound reasonable? > > -greg > > Best wishes, Michal |