Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: David C. <dav...@gm...> - 2017-09-24 15:54:50
|
Hi, I think Chris' solution is a bit overly complicated, though I haven't tested my alternative. If each atom in the ring is tested for '[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the vector bindings I provided previously, then I don't think you need to provide the SMARTS for the excluded ring starting from each atom. So long as 1 of the atoms in the ring fails the test, the whole ring fails, so you just need the same test on each atom. Dave On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cge...@gm...> wrote: > Hi Jim > > The key thing to remember about the recursive SMARTS clauses is that > they only match one atom (the first), and the rest of the string > describes the environment in which that atom is located. So the clause > $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has > embedded in the rest of the ring system. We then negate that with the > ! symbol. > > If we use just the recursive SMARTS expression '[$(a)]' (or the simple > SMARTS 'a'), it can match any of the six aromatic atoms in the > heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]' > means this atom can't match the nitrogen substituted by aliphatic > C,but it can still match any of the other five aromatic atoms. > Consequently there are five more exclusion clauses to add, each of > which starts with a different one of the aromatic atoms in your > undesired structure. As long as one of the atoms in the full SMARTS is > prevented from matching any of the atoms in the undesired structure in > this way, then the overall match is prevented. > > Adding an exclusion for pyridine is then easy. We're already excluding > six patterns, and (considering symmetry) we only need to add four more > to exclude all pyridines. Appending > ';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the > square brackets should do the trick. > > You're quite right though, this gets pretty cumbersome very quickly > and it may well be best to handle it in code with simple include / > exclude SMARTS patterns. You'll have to think about checking which > atoms have been matched - for example, do you want to match quinoline > because it contains a benzene ring, or exclude it because it contains > a pyridine? If the former you'll have to check that the atoms matched > by your two patterns are different. > > Hope this helps! > > Chris Earnshaw > > On 24 September 2017 at 15:01, James T. Metz <jam...@ao...> wrote: > > Chris, > > > > Wow! Your recursive SMARTS expression works as needed! > > > > Hmmm... Help me understand this better ... it looks like you "walk > around" > > the > > ring of the substructure we want to exclude and employ a slightly > different > > recursive SMARTS beginning at that atom. Is that correct? > > > > Also, since my situation is likely to get more complicated with respect > to > > exclusions, suppose I still wanted to utilize the general aromatic > > expression > > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to > exclude > > the structures we have been discussing, and I also wanted to exclude > > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1. > > > > Is there a SMARTS expression that would capture 2 exclusions? > > > > Perhaps this is getting too clumsy! It might be better to have one or > more > > inclusion SMARTS and one or more exclusion SMARTS, and write code > > to remove those groups of atoms that are coming from the exclusion > SMARTS. > > > > Any ideas for PYTHON/RDkit code? Something like > > > > test_smiles = 'c1ccccc1' > > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1' > > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1' > > etc... > > > > Hmmm... any other ideas, suggestions, comments? > > > > Thanks again. > > > > Regards, > > Jim Metz > > > > > > > > > > -----Original Message----- > > From: Chris Earnshaw <cge...@gm...> > > To: James T. Metz <jam...@ao...> > > Cc: Rdk...@li... > > <rdk...@li...> > > Sent: Sun, Sep 24, 2017 4:01 am > > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion > > > > Hi Jim > > > > It can be done with recursive SMARTS, though the syntax is a bit > > painful This may do what you want - > > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O) > nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O) > ;!$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1 > > > > Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1, > > with recursive SMARTS applied to the first atom to ensure that this > > can't match any of the 6 ring atoms in your undesired system. > > > > Regards, > > Chris Earnshaw > > > > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss > > <rdk...@li...> wrote: > >> Hello, > >> > >> Suppose I have the following molecule > >> > >> m = 'CN1C=CC(=O)NC1=O' > >> > >> I would like to be able to use a SMARTS pattern > >> > >> pattern = '[a]1:[a][a]:[a]:[a]:a]1' > >> > >> to recognize the 6 atoms in a typical aromatic ring, but > >> I do not want to recognize the 6 atoms in the molecule, > >> m, as aromatic. In other words, I am trying to write > >> a specific exclusion. > >> > >> Is it possible to modify the SMARTS pattern to > >> exclude the above molecule? I have tried using > >> recursive SMARTS, but I can't get the syntax to > >> work. > >> > >> Any ideas? Thank you. > >> > >> Regards, > >> Jim Metz > >> > >> > >> > >> > >> ------------------------------------------------------------ > ------------------ > >> Check out the vibrant tech community on one of the world's most > >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot > >> _______________________________________________ > >> Rdkit-discuss mailing list > >> Rdk...@li... > >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > >> > > ------------------------------------------------------------ > ------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot > _______________________________________________ > Rdkit-discuss mailing list > Rdk...@li... > https://lists.sourceforge.net/lists/listinfo/rdkit-discuss > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |