Re: [Rdkit-discuss] sanitization removes Hs - is this expected?
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2015-04-03 05:29:41
|
The changes are now pushed ( https://github.com/rdkit/rdkit/commit/f0d4cf1ec63a4928a2a28fa62cf1d255099e72d0) and are available on master. The new functions are qmol_from_smiles() and qmol_from_ctab() Best, -greg On Thu, Apr 2, 2015 at 10:51 AM, Greg Landrum <gre...@gm...> wrote: > Hi Michal, > > Glad to hear this matches what you are looking for. I have already added > the feature to the cartridge and will check it in later today/tomorrow > morning. > > -greg > > On Thursday, April 2, 2015, Michal Krompiec <mic...@gm...> > wrote: > >> Hi Greg, >> Thank you, this is exactly what I needed. >> >> On 2 April 2015 at 05:22, Greg Landrum <gre...@gm...> wrote: >> >>> >>> Skipping sanitization, as you propose, isn't going to help here: the >>> kekulized form of the ring will not be converted to aromatic and you won't >>> get the matches you are looking for. >>> >> Indeed. Previously I stored my dataset as smiles with explicit hydrogens, >> and created the query mols by adding Hs and then deleting hydrogens at >> substitution sites and finally converting to SMARTS - a messy workaround, >> but producing the right result. >> >> >>> Here's an approach to this that works in Python : >>> >> >> >> And this is exactly what I wanted. To illustrate it more precisely: your >> pattern (2-H-pyrimidine) matches pyrimidine, 5-methylpyrimidine but does >> not match 2-pyrimidine: >> >> >>> m =Chem.MolFromSmiles('c1ccnc([H])n1',sanitize=False); >> >>> nm=Chem.MergeQueryHs(m) >> >>> Chem.SanitizeMol(nm) >> rdkit.Chem.rdmolops.SanitizeFlags.SANITIZE_NONE >> >>> Chem.MolFromSmiles('c1ccncn1').HasSubstructMatch(nm) >> True >> >>> Chem.MolFromSmiles('c1c(C)cncn1').HasSubstructMatch(nm) >> True >> >>> Chem.MolFromSmiles('c1ccnc(C)n1').HasSubstructMatch(nm) >> False >> >>> >> >> >> >>> >>> Being able to do something equivalent in the cartridge would certainly >>> be useful. What I'd suggest is the addition of two functions: >>> "query_mol_from_smiles()" and "query_mol_from_ctab()" that do this. >>> >> >> I'll do it. >> >> >> Then you could do queries like: >>> select * from mols where m @> query_mol_from_smiles('c1ccnc([H])n1'); >>> and have it do the right thing. >>> >>> Sound reasonable? >>> >>> -greg >>> >>> >> Best wishes, >> Michal >> >> |