Re: [Rdkit-discuss] multiple SMARTS that match only if in the same fragment
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
From: Greg L. <gre...@gm...> - 2020-03-09 05:11:29
|
First: the reason the RDKit does not parse things like: In [2]: p = Chem.MolFromSmarts('([Cl-].[Na+])') [05:58:01] SMARTS Parse Error: syntax error while parsing: ([Cl-].[Na+]) [05:58:01] SMARTS Parse Error: Failed parsing SMARTS '([Cl-].[Na+])' for input: '([Cl-].[Na+])' In [3]: p = Chem.MolFromSmarts('([Cl-]).([Na+])') [05:59:16] SMARTS Parse Error: syntax error while parsing: ([Cl-]).([Na+]) [05:59:16] SMARTS Parse Error: Failed parsing SMARTS '([Cl-]).([Na+])' for input: '([Cl-]).([Na+])' is because those query types are not supported by the substructure search engine. Rather than accept the input and then doing the wrong thing, we've opted not to accept it. On Sun, Mar 8, 2020 at 1:03 AM Curt Fischer <cur...@gm...> wrote: > > Is there any consensus on idioms for identifying multiple moieties in the > same fragment? Do I have to use len(mol.GetSubstructMatches(patt)) > 1 as > some kind of selector and then do some kind of graph traversal routine to > see if any of the matches are covalently connected? > My standard answer if you want to find multiple entities in the same fragment you can use: In [4]: p = Chem.MolFromSmarts('O.N') and then either make sure that your molecules have a single fragment *or* that the matches you get back are contained in single fragment. Here's one way of doing that: In [18]: def fragsearch(m,p): ...: matches = [set(x) for x in m.GetSubstructMatches(p)] ...: for frag in frags: ...: for match in matches: ...: if match.issubset(frag): ...: return match ...: return False In [21]: m1 = Chem.MolFromSmiles('OCCCN.CCC') In [22]: m2 = Chem.MolFromSmiles('OCCC.CCCN') In [23]: m1.HasSubstructMatch(p) Out[23]: True In [24]: m2.HasSubstructMatch(p) Out[24]: True In [25]: fragsearch(m1,p) Out[25]: {0, 4} In [26]: fragsearch(m2,p) Out[26]: False Do you really have a use case where you have molecules containing multiple fragments that you can't separate into a pieces and you want to do this kind of search? Best, -greg On Sat, Mar 7, 2020 at 3:34 PM Ivan Tubert-Brohman < > iva...@sc...> wrote: > >> Hi Curt, >> >> According to >> https://www.rdkit.org/docs/RDKit_Book.html#smarts-support-and-extensions , >> it's not supported: >> >> Here’s the (hopefully complete) list of SMARTS features that are *not* >>> supported: >>> >>> - Non-tetrahedral chiral classes >>> >>> >>> - the @? operator >>> >>> >>> - explicit atomic masses (though isotope queries are supported) >>> >>> >>> - component level grouping requiring matches in different >>> components, i.e. (C).(C) >>> >>> OK, the way it's worded it sounds like (C.C) might be supported (since >> that would be requiring matches in the same component), but as you've seen, >> it isn't supported either... >> >> Ivan >> >> >> On Sat, Mar 7, 2020 at 4:58 PM Curt Fischer <cur...@gm...> >> wrote: >> >>> Hi rdkit fiends! >>> >>> The [Daylight SMARTS example page]( >>> https://daylight.com/dayhtml_tutorials/languages/smarts/smarts_examples.html) >>> gives several examples for "multiple group" smarts, including these strings: >>> >>> ([Cl!$(Cl~c)].[c!$(c~Cl)]) >>> ([Cl]).([c]) >>> ([Cl].[c]) >>> [NX3;H2,H1;!$(NC=O)].[NX3;H2,H1;!$(NC=O)] >>> >>> In general, I cannot get these to be parsed by Chem.MolFromSmarts(). >>> >>> For example, Chem.MolFromSmarts('([Cl!$(Cl~c)].[c!$(c~Cl)])') gives me >>> this error message: >>> >>> ``` >>> [13:01:41] SMARTS Parse Error: syntax error while parsing: >>> ([Cl!$(Cl~c)_100].[c!$(c~Cl)_101]) >>> [13:01:41] SMARTS Parse Error: Failed parsing SMARTS >>> '([Cl!$(Cl~c)_100].[c!$(c~Cl)_101])' for input: '([Cl!$(Cl~c)].[c!$(c~Cl)])' >>> ``` >>> My understanding of SMARTS is that the outermost parentheses in this >>> SMARTS string are required to force the chlorine and the aromatic carbon to >>> be somewhere in the same covalently connected fragment. E.g. this pattern >>> *should* hit benzyl chloride ClCc1ccccc1 but should *not* hit the >>> hydrochloride salt of aniline Cl.Nc1ccccc1. >>> >>> What am I getting wrong? Is there a way to write rdkit-parsable SMARTS >>> that achieves this? (I want to filter our molecules that contain more than >>> one of certain moieties, while allowing molecules that have one (or zero) >>> such moieties. But salts or covalently disconnected fragments that each >>> contain one instance of the moiety should be fine.) >>> >>> Details on my setup: >>> >>> - RDKit Version: 2019.09.3 >>> - Operating system: macOS 10.15.2 >>> - Python version (if relevant): 3.6 >>> - Are you using conda? yes >>> - If you are using conda, which channel did you install the rdkit from? >>> `conda-forge` >>> - If you are not using conda: how did you install the RDKit? >>> >>> Curt >>> >>> _______________________________________________ >>> Rdkit-discuss mailing list >>> Rdk...@li... >>> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >>> >> _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > |