Re: [Rdkit-discuss] SMILES/SMARTS codes that match multiple atoms
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Chris E. <cge...@gm...> - 2020-02-09 08:47:55
|
Sorry - tried to type this too early in the morning and introduced some errors transcribing the SMARTS pattern! It should have been "[CH](=O)O[$([CH3]),$([CH2]C)]") as in pat1 = Chem.MolFromSmarts("[CH](=O)O[$([CH3]),$([CH2]C)]") Best regards, Chris On Sun, 9 Feb 2020 at 08:28, Chris Earnshaw <cge...@gm...> wrote: > Hi > > I've always regarded it as dangerous to rely on the use of explicit > hydrogens in search queries and pattern matches. I think it's generally > safer to use H-count properties in your SMARTS. In your example case this > will require the use of recursive SMARTS to allow matching of the CH3 and > CH2Cn fragments you're interested in. The SMARTS > "[CH](=O)O[$(CH3);$([CH2]C)]" should do what you want. The [CH] forces it > to only match formate esters. The recursive SMARTS [$(CH3);$([CH2]C)] can > be interpreted as 'an atom which is EITHER an aliphatic carbon with 3 > hydrogens OR an aliphatic carbon with two hydrogens and an attached > aliphatic carbon'. It's possible to build very powerful queries using this > kind of approach, and it's not necessary to add explicit Hs to make it work. > > from rdkit import Chem > mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", > "C(=O)OCCC"]] > pat1 = Chem.MolFromSmarts("[CH](=O)O[$(CH3);$([CH2]C)]") > [mol.HasSubstructMatch(mol, pat1) for mol in mols] > [True, True, True] > > All the best, > Chris > > On Sat, 8 Feb 2020 at 20:29, Andrew Dalke <da...@da...> > wrote: > >> On Feb 8, 2020, at 17:55, Janusz Petkowski <jjp...@mi...> wrote: >> > >> > If not how can I match cases where in a given position there can be C >> or H with rdkit? >> >> I believe you should use #1 instead of H. >> >> >> >>> from rdkit import Chem >> >>> mols = [Chem.MolFromSmiles(s) for s in ["C(=O)OC", "C(=O)OCC", >> "C(=O)OCCC"]] >> >>> hmols = [Chem.AddHs(mol) for mol in mols] >> >> >> Your pattern: >> >> >>> pat1 = Chem.MolFromSmarts("[H]C(=O)OC([C,H])([H])[H]") >> >>> [mol.HasSubstructMatch(pat1) for mol in hmols] >> [False, True, True] >> >> Using #1 instead of H: >> >> >>> pat2 = Chem.MolFromSmarts("[H]C(=O)OC([C,#1])([#1])[#1]") >> >>> [mol.HasSubstructMatch(pat2) for mol in hmols] >> [True, True, True] >> >> >> "H" has an odd interpretation. >> https://www.daylight.com/dayhtml/doc/theory/theory.smarts.html says: >> >> Note that atomic primitive H can have two meanings, >> implying a property or the element itself. [H] means >> hydrogen atom. [*H2] means any atom with exactly >> two hydrogens attached >> >> I believe the goal of having [H] match a hydrogen atom is to allow a >> SMILES, when interpreted as a SMARTS, to be able to match the SMILES when >> interpreted as a molecule. I'm not sure about that though. >> >> Cheers, >> >> Andrew >> da...@da... >> >> >> >> >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > |