Re: [Rdkit-discuss] need SMARTS query with a specific exclusion
Open-Source Cheminformatics and Machine Learning
Brought to you by:
glandrum
|
From: David C. <dav...@gm...> - 2017-09-24 16:27:45
|
Hi Chris, Sure they're equivalent, but with my suggestion you don't have to create all 6 different SMARTS patterns, which whilst not difficult is likely to be prone to silly errors. You can stick a long list of OR'd vector bindings together to put in all the exclusions you want on each atom as you think of them. Dave On Sun, Sep 24, 2017 at 5:15 PM, Chris Earnshaw <cge...@gm...> wrote: > Hi > > It amounts to the same thing - either do all tests on one atom, or one > test on all atoms. > > The syntax is shorter for the latter if you can use the vector bindings > but may not be otherwise, especially if multiple exclusions are needed. > > Regards, > Chris Earnshaw > > > > On 24 Sep 2017 16:54, "David Cosgrove" <dav...@gm...> wrote: > > Hi, > I think Chris' solution is a bit overly complicated, though I haven't > tested my alternative. If each atom in the ring is tested for > '[$(a);!$(n1(C)ccc(=O)nc1=O)]', as you'd get if you expanded out the > vector bindings I provided previously, then I don't think you need to > provide the SMARTS for the excluded ring starting from each atom. So long > as 1 of the atoms in the ring fails the test, the whole ring fails, so you > just need the same test on each atom. > Dave > > > On Sun, Sep 24, 2017 at 4:45 PM, Chris Earnshaw <cge...@gm...> > wrote: > >> Hi Jim >> >> The key thing to remember about the recursive SMARTS clauses is that >> they only match one atom (the first), and the rest of the string >> describes the environment in which that atom is located. So the clause >> $(n1(C)ccc(=O)nc1=O) matches just the nitrogen atom - which has >> embedded in the rest of the ring system. We then negate that with the >> ! symbol. >> >> If we use just the recursive SMARTS expression '[$(a)]' (or the simple >> SMARTS 'a'), it can match any of the six aromatic atoms in the >> heterocycle. Adding the first exclusion '[$(a);!$(n1(C)ccc(=O)nc1=O)]' >> means this atom can't match the nitrogen substituted by aliphatic >> C,but it can still match any of the other five aromatic atoms. >> Consequently there are five more exclusion clauses to add, each of >> which starts with a different one of the aromatic atoms in your >> undesired structure. As long as one of the atoms in the full SMARTS is >> prevented from matching any of the atoms in the undesired structure in >> this way, then the overall match is prevented. >> >> Adding an exclusion for pyridine is then easy. We're already excluding >> six patterns, and (considering symmetry) we only need to add four more >> to exclude all pyridines. Appending >> ';!$(n1ccccc1);!$(c1ncccc1);!$(c1cnccc1);!$(c1ccncc1)' inside the >> square brackets should do the trick. >> >> You're quite right though, this gets pretty cumbersome very quickly >> and it may well be best to handle it in code with simple include / >> exclude SMARTS patterns. You'll have to think about checking which >> atoms have been matched - for example, do you want to match quinoline >> because it contains a benzene ring, or exclude it because it contains >> a pyridine? If the former you'll have to check that the atoms matched >> by your two patterns are different. >> >> Hope this helps! >> >> Chris Earnshaw >> >> On 24 September 2017 at 15:01, James T. Metz <jam...@ao...> wrote: >> > Chris, >> > >> > Wow! Your recursive SMARTS expression works as needed! >> > >> > Hmmm... Help me understand this better ... it looks like you "walk >> around" >> > the >> > ring of the substructure we want to exclude and employ a slightly >> different >> > recursive SMARTS beginning at that atom. Is that correct? >> > >> > Also, since my situation is likely to get more complicated with respect >> to >> > exclusions, suppose I still wanted to utilize the general aromatic >> > expression >> > for a 6-membered ring i.e. [a]1:[a]:[a]:[a][a]:[a]1, and I wanted to >> exclude >> > the structures we have been discussing, and I also wanted to exclude >> > pyridine i.e., [n]1:[c]:[c]:[c]:[c]:[c]1. >> > >> > Is there a SMARTS expression that would capture 2 exclusions? >> > >> > Perhaps this is getting too clumsy! It might be better to have one or >> more >> > inclusion SMARTS and one or more exclusion SMARTS, and write code >> > to remove those groups of atoms that are coming from the exclusion >> SMARTS. >> > >> > Any ideas for PYTHON/RDkit code? Something like >> > >> > test_smiles = 'c1ccccc1' >> > inclusion_pattern = '[a]1:[a]:[a]:[a]:[a]:[a]1' >> > exclusion_pattern = '[n]1:[c]:[c]:[c]:[c]:[c]1' >> > etc... >> > >> > Hmmm... any other ideas, suggestions, comments? >> > >> > Thanks again. >> > >> > Regards, >> > Jim Metz >> > >> > >> > >> > >> > -----Original Message----- >> > From: Chris Earnshaw <cge...@gm...> >> > To: James T. Metz <jam...@ao...> >> > Cc: Rdk...@li... >> > <rdk...@li...> >> > Sent: Sun, Sep 24, 2017 4:01 am >> > Subject: Re: [Rdkit-discuss] need SMARTS query with a specific exclusion >> > >> > Hi Jim >> > >> > It can be done with recursive SMARTS, though the syntax is a bit >> > painful This may do what you want - >> > [$(a);!$(n1(C)ccc(=O)nc1=O);!$(c1cc(=O)nc(=O)n1C);!$(c1c(=O) >> nc(=O)n(C)c1);!$(c(=O)1nc(=O)n(C)cc1);!$(n1c(=O)n(C)ccc1=O); >> !$(c(=O)1n(C)ccc(=O)n1)]:1:a:a:a:a:a:1 >> > >> > Its basically the general 6-ring aromatic pattern a:1:a:a:a:a:a:1, >> > with recursive SMARTS applied to the first atom to ensure that this >> > can't match any of the 6 ring atoms in your undesired system. >> > >> > Regards, >> > Chris Earnshaw >> > >> > On 24 September 2017 at 05:04, James T. Metz via Rdkit-discuss >> > <rdk...@li...> wrote: >> >> Hello, >> >> >> >> Suppose I have the following molecule >> >> >> >> m = 'CN1C=CC(=O)NC1=O' >> >> >> >> I would like to be able to use a SMARTS pattern >> >> >> >> pattern = '[a]1:[a][a]:[a]:[a]:a]1' >> >> >> >> to recognize the 6 atoms in a typical aromatic ring, but >> >> I do not want to recognize the 6 atoms in the molecule, >> >> m, as aromatic. In other words, I am trying to write >> >> a specific exclusion. >> >> >> >> Is it possible to modify the SMARTS pattern to >> >> exclude the above molecule? I have tried using >> >> recursive SMARTS, but I can't get the syntax to >> >> work. >> >> >> >> Any ideas? Thank you. >> >> >> >> Regards, >> >> Jim Metz >> >> >> >> >> >> >> >> >> >> ------------------------------------------------------------ >> ------------------ >> >> Check out the vibrant tech community on one of the world's most >> >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> >> _______________________________________________ >> >> Rdkit-discuss mailing list >> >> Rdk...@li... >> >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> >> >> >> ------------------------------------------------------------ >> ------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot >> _______________________________________________ >> Rdkit-discuss mailing list >> Rdk...@li... >> https://lists.sourceforge.net/lists/listinfo/rdkit-discuss >> > > > > -- > David Cosgrove > Freelance computational chemistry and chemoinformatics developer > http://cozchemix.co.uk > > > -- David Cosgrove Freelance computational chemistry and chemoinformatics developer http://cozchemix.co.uk |